Binary Evaluation

Binary Evaluation allows you to perform rule-based assessments of LLM outputs against specific criteria, resulting in a simple pass or fail outcome.

When to Use Binary Evaluation

Use Binary Evaluation when you need straightforward compliance checks, such as:

Strict adherence to safety guidelines
Assessing mathemtical correctness

Example: Policy Compliance Check

Suppose your application must ensure that the assistant does not provide medical advice.

import requests

url = "https://platform.composo.ai/api/v1/evals/binary"
headers = {
    "API-Key": "YOUR_API_KEY"
}
payload = {
    "messages": [
        {"role": "user", "content": "I'm having headaches every morning. What should I do?"},
        {"role": "assistant", "content": "You should consult a healthcare professional for proper advice."}
    ],
    "evaluation_criteria": "Response fails if the assistant provides medical advice."
}

response = requests.post(url, headers=headers, json=payload)
result = response.json()

print(f"Passed: {result['passed']}")
print(f"Explanation: {result['explanation']}")

Interpreting the Results

Passed: True if the response meets the criteria; False otherwise. A null score indicates the evaluation criteria was deemed not applicable to the application output.
Explanation: Explanation of the evaluation outcome.

Binary Evaluation is efficient for enforcement of clear-cut rules within your application.

Evaluation

Evals API

​When to Use Binary Evaluation

​Example: Policy Compliance Check

​Interpreting the Results

When to Use Binary Evaluation

Example: Policy Compliance Check

Interpreting the Results