Reward evaluation

This SDK is for Python users. If you’re using TypeScript, JavaScript, or other languages, please refer to the REST API Reference to call the API directly.

Installation

Install the SDK using pip:

pip install composo

Quick Start

Basic Evaluation

import os
from composo import Composo

client = Composo()

# Evaluate response quality
result = client.evaluate(
    messages=[
        {"role": "user", "content": "I'm frustrated with my device not working."},
        {"role": "assistant", "content": "I'm sorry to hear that. Let's see how I can help you resolve this problem."}
    ],
    criteria="Reward responses that express appropriate empathy if the user is facing a problem they're finding frustrating"
)

print(f"Score: {result.score}")
print(f"Explanation: {result.explanation}")

Asynchronous Client

import asyncio
from composo import AsyncComposo

async def main():
    client = AsyncComposo()
        result = await client.evaluate(
            messages=[
                {"role": "user", "content": "I'm frustrated with my device not working."},
                {"role": "assistant", "content": "I'm sorry to hear that. Let's see how I can help you resolve this problem."}
            ],
            criteria="Reward responses that express appropriate empathy if the user is facing a problem they're finding frustrating"
        )
        
        print(f"Score: {result.score}")
        print(f"Explanation: {result.explanation}")

asyncio.run(main())

Advanced Examples

Multiple Criteria Evaluation

import os
from composo import Composo

client = Composo()

messages = [
    {"role": "user", "content": "Explain quantum computing in simple terms"},
    {"role": "assistant", "content": "Quantum computing uses quantum mechanics to process information..."}
]

criteria = [
    "Reward responses that explain complex topics in simple terms",
    "Reward responses that provide accurate technical information",
    "Reward responses that are engaging and easy to understand"
]

results = client.evaluate(messages=messages, criteria=criteria)

for i, result in enumerate(results):
    print(f"Criteria {i+1}: Score = {result.score}")
    print(f"Explanation: {result.explanation}\n")

Tool Call Evaluation

import os
from composo import Composo

client = Composo()

messages = [
    {"role": "user", "content": "What's the weather like in New York?"},
    {
        "role": "assistant", 
        "content": "Let me check the weather for you.",
        "tool_calls": [
            {
                "id": "call_123",
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "arguments": '{"location": "New York"}'
                }
            }
        ]
    }
]

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather information for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string", "description": "City name"}
                },
                "required": ["location"]
            }
        }
    }
]

result = client.evaluate(
    messages=messages,
    tools=tools,
    criteria="Reward responses that make relevant tool calls to address the user's prompt"
)

print(f"Tool Call Score: {result.score}")

Binary Evaluation

import os
from composo import Composo

client = Composo()

messages = [
    {"role": "user", "content": "What is 2+2?"},
    {"role": "assistant", "content": "2+2 equals 4"}
]

result = client.evaluate(
    messages=messages,
    criteria="Response passes if it provides the correct mathematical answer"
)

print(f"Passed: {abs(result.score - 1.0) < 1e-6}")
print(f"Explanation: {result.explanation}")

Evaluating LLM Results

import os
import openai
from composo import Composo

client = Composo()

openai_client = openai.OpenAI(api_key="your-openai-key")
openai_result = openai_client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "What is machine learning?"}]
)

result = client.evaluate(
    messages=[{"role": "user", "content": "What is machine learning?"}],
    result=openai_result,
    criteria="Reward accurate technical explanations"
)

print(f"Score: {result.score}")

Response Format

The evaluate method returns an EvaluationResponse object:

class EvaluationResponse:
    score: Optional[float]      # Score from 0-1
explanation: str            # Evaluation explanation

Example Response

result = client.evaluate(messages=messages, criteria=criteria)

print(f"Score: {result.score}")
print(f"Explanation: {result.explanation}")

Python SDK

​Installation

​Quick Start

​Basic Evaluation

​Asynchronous Client

​Advanced Examples

​Multiple Criteria Evaluation

​Tool Call Evaluation

​Binary Evaluation

​Evaluating LLM Results

​Response Format

​Example Response

Installation

Quick Start

Basic Evaluation

Asynchronous Client

Advanced Examples

Multiple Criteria Evaluation

Tool Call Evaluation

Binary Evaluation

Evaluating LLM Results

Response Format

Example Response