> ## Documentation Index
> Fetch the complete documentation index at: https://docs.composo.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Quickstart

> Ship AI agents that actually work in production

Composo delivers deterministic, accurate evaluation for LLM applications through purpose-built generative reward models. Unlike unreliable LLM-as-judge approaches, our specialized models provide consistent, precise scores you can trust—with just a single sentence criteria.

# Quickstart

Get up and running with Composo in under 5 minutes. This guide will help you evaluate your first LLM response and understand how Composo delivers deterministic, accurate evaluations.

## Step 1: Create Your Account

Sign up for a Composo account at [platform.composo.ai](https://platform.composo.ai).

## Step 2: Generate Your API Key

1. Navigate to **Profile** → **API Keys** in the dashboard
2. Click **Generate New API Key**

## Step 3: Run Your First Evaluation

\[Optional] Install the SDK:

```bash theme={null}
pip install composo
```

Now let's evaluate a customer service response for empathy and helpfulness using the Composo SDK:

<CodeGroup>
  ```python Python wrap theme={null}
  from composo import Composo

  # Initialize the client with your API key
  composo_client = Composo(api_key="YOUR_API_KEY")

  # Example: Evaluating a customer service response
  result = composo_client.evaluate(
      messages=[
          {"role": "user", "content": "I'm really frustrated with my device not working."},
          {"role": "assistant", "content": "I'm sorry to hear that you're experiencing issues with your device. Let's see how I can assist you to resolve this problem."}
      ],
      criteria="Reward responses that express appropriate empathy if the user is facing a problem they're finding frustrating"
  )

  # Display results
  print(f"Score: {result.score}")
  print(f"Analysis: {result.explanation}")
  ```

  ```bash cURL theme={null}
  curl -X POST "https://platform.composo.ai/api/v1/evals/reward" \
    -H "API-Key: YOUR_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "messages": [
        {
          "role": "user",
          "content": "I'\''m really frustrated with my device not working."
        },
        {
          "role": "assistant",
          "content": "I'\''m sorry to hear that you'\''re experiencing issues with your device. Let'\''s see how I can assist you to resolve this problem."
        }
      ],
      "evaluation_criteria": "Reward responses that express appropriate empathy if the user is facing a problem they'\''re finding frustrating"
    }'
  ```
</CodeGroup>

### Understanding the Results

Composo returns:

* **Score**: A value between 0 and 1 (e.g. 0.86 means the response strongly meets your criteria)
* **Explanation**: Detailed analysis of why the response received this score

Example output:

```json JSON wrap theme={null}
Score: 0.86/1.0
Analysis: - The assistant directly acknowledges the user's difficulty and expresses sympathy ("I'm sorry to hear that you're experiencing issues"), showing clear empathy.
- The response is timely and supportive, immediately addressing the expressed frustration and not ignoring the emotional content.
- It constructively adds a collaborative next step ("Let's see how I can assist you"), enhancing the empathetic tone, with only minor room for deeper emotional mirroring.
```

## Step 4: Evaluate Agents with Tracing

For agent applications, Composo provides real-time tracing to capture and evaluate multi-agent interactions. Here's a simple example with an orchestrator coordinating two sub-agents:

```python Python wrap theme={null}
from composo import Composo
from composo.models import criteria
from composo.tracing import ComposoTracer, Instruments, AgentTracer, agent_tracer
from openai import OpenAI

# Initialize tracing for OpenAI
ComposoTracer.init(instruments=[Instruments.OPENAI])
composo_client = Composo(api_key="YOUR_API_KEY")
openai_client = OpenAI()

# Define a simple sub-agent
@agent_tracer(name="research_agent")
def research_agent(topic):
    return openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": f"Research: {topic}"}],
        max_tokens=50
    )

# Orchestrator coordinates multiple agents
with AgentTracer("orchestrator") as tracer:
    # First sub-agent: planning
    with AgentTracer("planning_agent"):
        plan = openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": "Plan a trip to Paris"}],
            max_tokens=50
        )

    # Second sub-agent: research
    research = research_agent("Paris attractions")

# Evaluate the full agent trace
results = composo_client.evaluate_trace(tracer.trace, criteria=criteria.agent)

for result, criterion in zip(results, criteria.agent):
    print(f"Criterion: {criterion}")
    print(f"Evaluation Result: {result}\n")
```

This example shows how Composo traces each agent's LLM calls independently and evaluates them against our comprehensive agent framework.