Evaluates: Response quality, tone, safety, adherence to guidelines and any other custom criteria that can be highly domain specific. Example criteria:
  • Concise: “Reward responses that are clear and concise, avoiding unnecessary verbosity or repetition.”
  • Information Structure: “Reward responses that present technical information in a logical, well-organized format that prioritizes the most important details.”
  • Actionable Guidance: “Reward responses that provide practical next steps”
  • Professional: “Reward responses that maintain appropriate professional tone and language suitable for the context”
  • Harmful output: “Penalize responses that provide medical advice”