Composo provides a Python SDK for evaluating standard LLM applications, with:

Dual Client Support: Both synchronous and asynchronous clients
Convenient Format: Compatible with python dictionaries and results objects from OpenAI and Anthropic
HTTP Goodies: Connection pooling + retry logic

Note: This SDK is for Python users. If you’re using TypeScript, JavaScript, or other languages, please refer to the REST API Reference to call the API directly.

Installation

Install the SDK using pip:

pip install composo

Quick Start

Let’s run a simple Hello World evaluation to get started with Composo evaluation.

Python

from composo import Composo

composo_client = Composo()

result = composo_client.evaluate(
    messages=[
        {"role": "user", "content": "Hello"},
        {"role": "assistant", "content": "Hello! How can I help you today?"}
    ],
    criteria="Reward responses that are friendly"
)

print(f"Score: {result.score}")
print(f"Explanation: {result.explanation}")

Reference

Client Parameters

Both Composo and AsyncComposo clients accept the following parameters during instantiation:

Parameter	Type	Required	Default	Description
`api_key`	`str`	No*	`None`	Your Composo API key. If not provided, will use `COMPOSO_API_KEY` environment variable
`model_core`	`str`	No	Lastest Align model	Specify the model to use for evaluation. Options: `align-20250529`, `align-lightning-20250731`
`num_retries`	`int`	No	`1`	Number of retry attempts for failed requests

*Required if COMPOSO_API_KEY environment variable is not set.

Lightning model does not currently support agents and tool calling, for that evaluation you must be using the default align model.

Evaluation Method Parameters

The evaluate() method accepts the following parameters:

Parameter	Type	Required	Description
`messages`	`List[Dict]`	Yes	List of message dictionaries with ‘role’ and ‘content’ keys
`criteria`	`str` or `List[str]`	Yes	Evaluation criteria (single string or list of criteria)
`tools`	`List[Dict]`	No	Tool definitions for evaluating tool calls
`result`	`OpenAI/Anthropic Result Object`	No	Pre-computed LLM result object to evaluate

Environment Variables

The SDK supports the following environment variables:

COMPOSO_API_KEY: Your Composo API key (used when api_key parameter is not provided)

Response Format

The evaluate method returns an EvaluationResponse object:

Python

class EvaluationResponse:
    score: Optional[float]      # Score from 0-1
    explanation: str            # Evaluation explanation

Async Evaluation

Use the async client when you need to run multiple evaluations concurrently or integrate with async workflows.

Python

import asyncio
from composo import AsyncComposo

async def main():
    composo_client = AsyncComposo()
    result = await composo_client.evaluate(
        messages=[
            {"role": "user", "content": "Hello"},
            {"role": "assistant", "content": "Hello! How can I help you today?"}
        ],
        criteria="Reward responses that are friendly"
    )
    
    print(f"Score: {result.score}")
    print(f"Explanation: {result.explanation}")

asyncio.run(main())

Multiple Criteria Evaluation

When evaluating against multiple criteria, the async client runs all evaluations concurrently for better performance.

Python

import os
import asyncio
from composo import AsyncComposo

async def main():
    client = AsyncComposo()

    messages = [
        {"role": "user", "content": "Explain quantum computing in simple terms"},
        {"role": "assistant", "content": "Quantum computing uses quantum mechanics to process information..."}
    ]

    criteria = [
        "Reward responses that explain complex topics in simple terms",
        "Reward responses that provide accurate technical information",
        "Reward responses that are engaging and easy to understand"
    ]

    results = await client.evaluate(messages=messages, criteria=criteria)

    for i, result in enumerate(results):
        print(f"Criteria {i+1}: Score = {result.score}")
        print(f"Explanation: {result.explanation}\n")

asyncio.run(main())

Evaluating OpenAI/Anthropic Outputs

You can directly evaluate the result of a call to the OpenAI SDK by passing the return of completions.create to composo evaluate. N.B. Composo will always evaluate choices[0].

Python

import os
import openai
from composo import Composo

composo_client = Composo()

openai_composo_client = openai.OpenAI(api_key="your-openai-key")
openai_result = openai_composo_client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is machine learning?"}]
)

result = composo_client.evaluate(
    messages=[{"role": "user", "content": "What is machine learning?"}],
    result=openai_result,
    criteria="Reward accurate technical explanations"
)

print(f"Score: {result.score}")

Error Handling

The SDK provides specific exception types:

Python

from composo import (
    ComposoError,
    RateLimitError,
    MalformedError,
    APIError,
    AuthenticationError
)

try:
    result = composo_client.evaluate(messages=messages, criteria=criteria)
except RateLimitError:
    print("Rate limit exceeded")
except AuthenticationError:
    print("Invalid API key")
except ComposoError as e:
    print(f"Composo error: {e}")

Logging

The SDK uses Python’s standard logging module. Configure logging level:

Python

import logging
logging.getLogger("composo").setLevel(logging.INFO)

Intro

Use cases

Python SDK

Monitoring

Guides

Evaluating Standard LLM Apps

Installation

Quick Start

Reference

Client Parameters

Evaluation Method Parameters

Environment Variables

Response Format

Async Evaluation

Multiple Criteria Evaluation

Evaluating OpenAI/Anthropic Outputs

Error Handling

Logging

Intro

Use cases

Python SDK

Monitoring

Guides

​Installation

​Quick Start

​Reference

​Client Parameters

​Evaluation Method Parameters

​Environment Variables

​Response Format

​Async Evaluation

​Multiple Criteria Evaluation

​Evaluating OpenAI/Anthropic Outputs

​Error Handling

​Logging

Installation

Quick Start

Reference

Client Parameters

Evaluation Method Parameters

Environment Variables

Response Format

Async Evaluation

Multiple Criteria Evaluation

Evaluating OpenAI/Anthropic Outputs

Error Handling

Logging