Composo Reward Scoring provides precise, quantitative, accurate scoring for your LLM outputs based on any custom criteria.

When to Use Reward Scoring

Use Reward Scoring evaluation when you need fine-grained assessments of responses based on complex, subjective criteria. This method is ideal for:

  • Assessing adherence to source material
  • Tailoring responses to match specific user preferences or brand voices.
  • Ensuring responses are comprehensive yet relevant and concise

Example: Evaluating Tone and Style

Suppose you are developing a customer support chatbot and want to ensure the responses are empathetic.

import requests

url = "https://platform.composo.ai/api/v1/evals/reward"
headers = {
    "API-Key": "YOUR_API_KEY"
}
payload = {
    "messages": [
        {"role": "user", "content": "I'm really frustrated with my device not working."},
        {"role": "assistant", "content": "I'm sorry to hear that you're experiencing issues with your device. Let's see how I can assist you to resolve this problem."}
    ],
    "evaluation_criteria": "Reward responses that express appropriate empathy if the user is facing a problem they're finding frustrating"
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()

print(f"Score: {result['score']}")
print(f"Explanation: {result.get('explanation', 'No feedback provided.')}")

Note: When evaluating non-monotonic qualities with qualifiers (e.g., “appropriate”), the score reflects how well the response meets the optimal level of that quality. Higher scores indicate better adherence to what is considered appropriate or optimal in the context.

Interpreting the Score

  • Score: A continuous value between 0 and 1 indicating how well the response meets the evaluation criteria.
    • 0: Does not meet the criteria at all.
    • Values between 0 and 1: Partially meets the criteria.
    • 1: Fully meets the criteria.
    • null: The evaluation criteria was deemed not applicable to the application output.
  • Explanation: Optional detailed explanation providing insights into the evaluation.

Composo Reward Score evaluation is trained to give accurate, precise scores which can be used to fine-tune your model’s responses to align closely with your application’s goals.