Composo provides a Python SDK for Composo evaluation, with:
  • Dual Client Support: Both synchronous and asynchronous clients
  • Convenient Format: Compatible with python dictionaries and results objects from OpenAI and Anthropic
  • HTTP Goodies: Connection pooling + retry logic
Note: This SDK is for Python users. If you’re using TypeScript, JavaScript, or other languages, please refer to the REST API Reference to call the API directly.

Installation

Install the SDK using pip:
pip install composo

Quick Start

Let’s run a simple Hello World evaluation to get started with Composo evaluation.
Python
from composo import Composo

composo_client = Composo()

result = composo_client.evaluate(
    messages=[
        {"role": "user", "content": "Hello"},
        {"role": "assistant", "content": "Hello! How can I help you today?"}
    ],
    criteria="Reward responses that are friendly"
)

print(f"Score: {result.score}")
print(f"Explanation: {result.explanation}")

Reference

Client Parameters

Both Composo and AsyncComposo clients accept the following parameters during instantiation:
ParameterTypeRequiredDefaultDescription
api_keystrNo*NoneYour Composo API key. If not provided, will use COMPOSO_API_KEY environment variable
num_retriesintNo1Number of retry attempts for failed requests
*Required if COMPOSO_API_KEY environment variable is not set.

Evaluation Method Parameters

The evaluate() method accepts the following parameters:
ParameterTypeRequiredDescription
messagesList[Dict]YesList of message dictionaries with ‘role’ and ‘content’ keys
criteriastr or List[str]YesEvaluation criteria (single string or list of criteria)
toolsList[Dict]NoTool definitions for evaluating tool calls
resultOpenAI/Anthropic Result ObjectNoPre-computed LLM result object to evaluate

Environment Variables

The SDK supports the following environment variables:
  • COMPOSO_API_KEY: Your Composo API key (used when api_key parameter is not provided)

Response Format

The evaluate method returns an EvaluationResponse object:
Python
class EvaluationResponse:
    score: Optional[float]      # Score from 0-1
    explanation: str            # Evaluation explanation

Async Evaluation

Use the async client when you need to run multiple evaluations concurrently or integrate with async workflows.
Python
import asyncio
from composo import AsyncComposo

async def main():
    composo_client = AsyncComposo()
    result = await composo_client.evaluate(
        messages=[
            {"role": "user", "content": "Hello"},
            {"role": "assistant", "content": "Hello! How can I help you today?"}
        ],
        criteria="Reward responses that are friendly"
    )
    
    print(f"Score: {result.score}")
    print(f"Explanation: {result.explanation}")

asyncio.run(main())

Multiple Criteria Evaluation

When evaluating against multiple criteria, the async client runs all evaluations concurrently for better performance.
Python
import os
import asyncio
from composo import AsyncComposo

async def main():
    client = AsyncComposo()

    messages = [
        {"role": "user", "content": "Explain quantum computing in simple terms"},
        {"role": "assistant", "content": "Quantum computing uses quantum mechanics to process information..."}
    ]

    criteria = [
        "Reward responses that explain complex topics in simple terms",
        "Reward responses that provide accurate technical information",
        "Reward responses that are engaging and easy to understand"
    ]

    results = await client.evaluate(messages=messages, criteria=criteria)

    for i, result in enumerate(results):
        print(f"Criteria {i+1}: Score = {result.score}")
        print(f"Explanation: {result.explanation}\n")

asyncio.run(main())

Evaluating OpenAI/Anthropic Outputs

You can directly evaluate the result of a call to the OpenAI SDK by passing the return of completions.create to composo evaluate. N.B. Composo will always evaluate choices[0].
Python
import os
import openai
from composo import Composo

composo_client = Composo()

openai_composo_client = openai.OpenAI(api_key="your-openai-key")
openai_result = openai_composo_client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is machine learning?"}]
)

result = composo_client.evaluate(
    messages=[{"role": "user", "content": "What is machine learning?"}],
    result=openai_result,
    criteria="Reward accurate technical explanations"
)

print(f"Score: {result.score}")

Error Handling

The SDK provides specific exception types:
Python
from composo import (
    ComposoError,
    RateLimitError,
    MalformedError,
    APIError,
    AuthenticationError
)

try:
    result = composo_client.evaluate(messages=messages, criteria=criteria)
except RateLimitError:
    print("Rate limit exceeded")
except AuthenticationError:
    print("Invalid API key")
except ComposoError as e:
    print(f"Composo error: {e}")

Logging

The SDK uses Python’s standard logging module. Configure logging level:
Python
import logging
logging.getLogger("composo").setLevel(logging.INFO)