Documentation Index
Fetch the complete documentation index at: https://docs.composo.ai/llms.txt
Use this file to discover all available pages before exploring further.
Composo delivers deterministic, accurate evaluation for LLM applications through purpose-built generative reward models. Unlike unreliable LLM-as-judge approaches, our specialized models provide consistent, precise scores you can trust—with just a single sentence criteria.
Quickstart
Get up and running with Composo in under 5 minutes. This guide will help you evaluate your first LLM response and understand how Composo delivers deterministic, accurate evaluations.
Step 1: Create Your Account
Sign up for a Composo account at platform.composo.ai.
Step 2: Generate Your API Key
- Navigate to Profile → API Keys in the dashboard
- Click Generate New API Key
Step 3: Run Your First Evaluation
[Optional] Install the SDK:
Now let’s evaluate a customer service response for empathy and helpfulness using the Composo SDK:
from composo import Composo
# Initialize the client with your API key
composo_client = Composo(api_key="YOUR_API_KEY")
# Example: Evaluating a customer service response
result = composo_client.evaluate(
messages=[
{"role": "user", "content": "I'm really frustrated with my device not working."},
{"role": "assistant", "content": "I'm sorry to hear that you're experiencing issues with your device. Let's see how I can assist you to resolve this problem."}
],
criteria="Reward responses that express appropriate empathy if the user is facing a problem they're finding frustrating"
)
# Display results
print(f"Score: {result.score}")
print(f"Analysis: {result.explanation}")
Understanding the Results
Composo returns:
- Score: A value between 0 and 1 (e.g. 0.86 means the response strongly meets your criteria)
- Explanation: Detailed analysis of why the response received this score
Example output:
Score: 0.86/1.0
Analysis: - The assistant directly acknowledges the user's difficulty and expresses sympathy ("I'm sorry to hear that you're experiencing issues"), showing clear empathy.
- The response is timely and supportive, immediately addressing the expressed frustration and not ignoring the emotional content.
- It constructively adds a collaborative next step ("Let's see how I can assist you"), enhancing the empathetic tone, with only minor room for deeper emotional mirroring.
Step 4: Evaluate Agents with Tracing
For agent applications, Composo provides real-time tracing to capture and evaluate multi-agent interactions. Here’s a simple example with an orchestrator coordinating two sub-agents:
from composo import Composo
from composo.models import criteria
from composo.tracing import ComposoTracer, Instruments, AgentTracer, agent_tracer
from openai import OpenAI
# Initialize tracing for OpenAI
ComposoTracer.init(instruments=[Instruments.OPENAI])
composo_client = Composo(api_key="YOUR_API_KEY")
openai_client = OpenAI()
# Define a simple sub-agent
@agent_tracer(name="research_agent")
def research_agent(topic):
return openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": f"Research: {topic}"}],
max_tokens=50
)
# Orchestrator coordinates multiple agents
with AgentTracer("orchestrator") as tracer:
# First sub-agent: planning
with AgentTracer("planning_agent"):
plan = openai_client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Plan a trip to Paris"}],
max_tokens=50
)
# Second sub-agent: research
research = research_agent("Paris attractions")
# Evaluate the full agent trace
results = composo_client.evaluate_trace(tracer.trace, criteria=criteria.agent)
for result, criterion in zip(results, criteria.agent):
print(f"Criterion: {criterion}")
print(f"Evaluation Result: {result}\n")
This example shows how Composo traces each agent’s LLM calls independently and evaluates them against our comprehensive agent framework.