How to write effective criteria

When crafting your evaluation criteria, consider the following guidelines to ensure effective and meaningful assessments:

Be Specific and Focused: Clearly define the quality or behavior you want to evaluate. Avoid vague statements. Focus on a single aspect per criterion to maintain clarity.

Example: Instead of “good,” use “a friendly and encouraging tone.”

Use Clear Direction: Begin your criteria with an explicit directive such as "Reward responses that..." or "Penalize responses where...".

Example: "Reward responses that use empathetic language when addressing user concerns."

Monotonic or Appropriately Qualified Qualities: Ideally, the quality you’re assessing should be monotonic, i.e. more of the quality is better (for rewards) or worse (for penalties). However, when dealing with non-monotonic qualities (where more is not always better), use qualifiers such as “appropriate” to ensure that higher scores represent better adherence to the desired quality.

Example: Instead of "Reward responses that are polite" which can become excessive, use "Reward responses that use an appropriate level of politeness" ensuring that the response is polite but not overly so.

Avoid Conjunctions: Focus on one quality at a time. Using conjunctions like “and” might indicate multiple qualities, which can lead to poorly defined behavior if one of the two qualities is poorly adhered to.

Example: Instead of "The assistant should be concise and informative" split into separate criteria.

Avoid LLM Keywords: Composo’s reward model is finetuned from LLM models trained in conversation format. Avoid alternate definitions of ‘User’ and ‘Assistant’ that might conflict with LLM keywords ‘user’ and ‘assistant’

Example: Instead of "Reward responses that comprehensively address the User Question", rename the ‘User Question’ in your prompt and consider "Reward responses that comprehensively address the Target Question"

Domain-Specific: Domain expertise is your secret weapon in getting good evaluation quality. Injecting your own domain knowledge and understanding of what a ‘good’ answer is improves the leverage your evaluation model has over the generative model.

Qualifiers (Optional): If the criterion applies only to certain situations, include a qualifier starting with “if” to specify when it should be applied.

Example: "Reward responses that provide code examples if the user asks for implementation details"

Example Clauses And Recommendations for Improvements

Recommended Template for Crafting Criteria

[Direction] responses [quality] [qualifier (optional)].

Components:

Direction: “Reward” or “Penalize”.
Quality: The specific property or behavior to evaluate.
Qualifier (Optional): An “if” statement specifying conditions.

Example Criteria:

"Reward responses that provide a comprehensive analysis of the code snippet"
"Penalize responses where the language is overly technical if the response is for a beginner"
"Reward responses that use an appropriate level of politeness"

On this page

Recommended Template for Crafting Criteria

When crafting your evaluation criteria, consider the following guidelines to ensure effective and meaningful assessments:

Be Specific and Focused: Clearly define the quality or behavior you want to evaluate. Avoid vague statements. Focus on a single aspect per criterion to maintain clarity.

Example: Instead of “good,” use “a friendly and encouraging tone.”

Use Clear Direction: Begin your criteria with an explicit directive such as "Reward responses that..." or "Penalize responses where...".

Example: "Reward responses that use empathetic language when addressing user concerns."

Example: Instead of "Reward responses that are polite" which can become excessive, use "Reward responses that use an appropriate level of politeness" ensuring that the response is polite but not overly so.

Example: Instead of "The assistant should be concise and informative" split into separate criteria.

Example: Instead of "Reward responses that comprehensively address the User Question", rename the ‘User Question’ in your prompt and consider "Reward responses that comprehensively address the Target Question"

Qualifiers (Optional): If the criterion applies only to certain situations, include a qualifier starting with “if” to specify when it should be applied.

Example: "Reward responses that provide code examples if the user asks for implementation details"

Example Clauses And Recommendations for Improvements

Recommended Template for Crafting Criteria

[Direction] responses [quality] [qualifier (optional)].

Components:

Direction: “Reward” or “Penalize”.
Quality: The specific property or behavior to evaluate.
Qualifier (Optional): An “if” statement specifying conditions.

Example Criteria:

"Reward responses that provide a comprehensive analysis of the code snippet"
"Penalize responses where the language is overly technical if the response is for a beginner"
"Reward responses that use an appropriate level of politeness"

On this page

Recommended Template for Crafting Criteria

How to write effective criteria

Reward responses that provide correct information based solely on the provided context without fabricating details.

Reward responses that directly address the ‘User Question’ without including irrelevant information.

Reward responses that properly cite the specific source of information from the provided context.

Reward responses that appropriately acknowledge limitations if information is incomplete or unavailable rather than guessing.

Reward responses that comprehensively address all aspects of the ‘User Question’ if information is available in the context.

Reward responses that present technical information in a logical, well-organized format that prioritizes the most important details.

Reward responses that provide practical next steps or recommendations if appropriate and supported by the context.

Reward responses that strictly include only information explicitly stated in the support ticket, without adding any fabricated details or assumptions.

Reward responses that correctly identify and include specific entities (payment methods, product categories, brands, couriers) only when explicitly mentioned in the ticket, avoiding hallucinations of these elements.

Reward responses that include all significant elements of the support ticket, including the nature of the issue, agent actions, and resolutions offered, without omitting key details.

Reward responses that present the information in a clear chronological sequence that accurately reflects the flow of the support interaction.

Penalize responses that include unnecessary concluding statements, evaluative summaries, or editorial comments not derived from the ticket content.

Reward responses that demonstrate empathy while acknowledging the friend’s feelings of defeat without minimizing them.

Reward responses that explain ethical concerns when declining harmful requests rather than simply refusing without context

Reward responses that maintain an appropriate educational tone suitable for academic assessment contexts

Recommended Template for Crafting Criteria

Evals

API Reference

How to write effective criteria

Reward responses that provide correct information based solely on the provided context without fabricating details.

Reward responses that directly address the ‘User Question’ without including irrelevant information.

Reward responses that properly cite the specific source of information from the provided context.

Reward responses that appropriately acknowledge limitations if information is incomplete or unavailable rather than guessing.

Reward responses that comprehensively address all aspects of the ‘User Question’ if information is available in the context.

Reward responses that present technical information in a logical, well-organized format that prioritizes the most important details.

Reward responses that provide practical next steps or recommendations if appropriate and supported by the context.

Reward responses that strictly include only information explicitly stated in the support ticket, without adding any fabricated details or assumptions.

Reward responses that correctly identify and include specific entities (payment methods, product categories, brands, couriers) only when explicitly mentioned in the ticket, avoiding hallucinations of these elements.

Reward responses that include all significant elements of the support ticket, including the nature of the issue, agent actions, and resolutions offered, without omitting key details.

Reward responses that present the information in a clear chronological sequence that accurately reflects the flow of the support interaction.

Penalize responses that include unnecessary concluding statements, evaluative summaries, or editorial comments not derived from the ticket content.

Reward responses that demonstrate empathy while acknowledging the friend’s feelings of defeat without minimizing them.

Reward responses that explain ethical concerns when declining harmful requests rather than simply refusing without context

Reward responses that maintain an appropriate educational tone suitable for academic assessment contexts

Recommended Template for Crafting Criteria

​Recommended Template for Crafting Criteria

Evals

API Reference

​Recommended Template for Crafting Criteria

Recommended Template for Crafting Criteria

Recommended Template for Crafting Criteria