Reward Score Evaluation
Fine-grained assessments based on custom criteria
Composo Reward Scoring provides precise, quantitative, accurate scoring for your LLM outputs based on any custom criteria.
When to Use Reward Scoring
Use Reward Scoring evaluation when you need fine-grained assessments of responses based on complex, subjective criteria. This method is ideal for:
- Assessing adherence to source material
- Tailoring responses to match specific user preferences or brand voices.
- Ensuring responses are comprehensive yet relevant and concise
Example: Evaluating Tone and Style
Suppose you are developing a customer support chatbot and want to ensure the responses are empathetic.
Note: When evaluating non-monotonic qualities with qualifiers (e.g., “appropriate”), the score reflects how well the response meets the optimal level of that quality. Higher scores indicate better adherence to what is considered appropriate or optimal in the context.
Interpreting the Score
- Score: A continuous value between 0 and 1 indicating how well the response meets the evaluation criteria.
- 0: Does not meet the criteria at all.
- Values between 0 and 1: Partially meets the criteria.
- 1: Fully meets the criteria.
- null: The evaluation criteria was deemed not applicable to the application output.
- Explanation: Optional detailed explanation providing insights into the evaluation.
Composo Reward Score evaluation is trained to give accurate, precise scores which can be used to fine-tune your model’s responses to align closely with your application’s goals.