"Reward responses that..."
, "Penalize responses that..."
, "Reward tool calls..."
, "Reward agents that..."
.
"Reward responses that use empathetic language when addressing user concerns."
"Reward responses that are polite"
which can become excessive, use "Reward responses that use an appropriate level of politeness"
ensuring the response is polite but not overly so."The assistant should be concise and informative"
split into two separate criteria."Reward responses that comprehensively address the User Question"
, rename the ‘User Question’ in your prompt and use "Reward responses that comprehensively address the Target Question"
"Reward responses that distinguish between emergency symptoms requiring immediate care versus symptoms suitable for routine appointments"
"Reward responses that provide code examples if the user asks for implementation details"
Example Clauses And Recommendations for Improvements
"Reward responses that provide a comprehensive analysis of the code snippet"
"Penalize responses where the language is overly technical if the response is for a beginner"
"Reward responses that use an appropriate level of politeness"
"Reward agents that explore new information and capabilities despite uncertainty"
"Tool call passes if all required parameters are provided without fabrication"