Accuracy Evaluation

Evaluates: Faithfulness to sources, completeness of information use, and proper citations Implementation: Include retrieved contexts within the user message or tool call, then use criteria focused on:

Faithfulness: “Reward responses that make only claims directly supported by the provided source material without any hallucination or speculation”
Completeness: “Reward responses that comprehensively include all relevant information from the source material needed to fully answer the question”
Precision: “Reward responses that include only information necessary to answer the question without extraneous details from the source material”
Relevance: “Reward responses where all content directly addresses and is relevant to answering the user’s specific question”
Refusals: “Reward responses that appropriately refuse to answer when the source material lacks sufficient information to address the question”
Sources: “Reward responses that explicitly cite or reference the specific source documents or sections used to support each claim”

Python

from composo import Composo

composo_client = Composo(api_key="your-api-key-here")

# Example: Evaluating how well an LLM uses provided context
result = composo_client.evaluate(
    messages=[
        {
            "role": "user", 
            "content": """What is the current population of Tokyo?

Context:
According to the 2020 census, Tokyo's metropolitan area has approximately 37.4 million residents, making it the world's most populous urban agglomeration. The Tokyo Metropolis itself has 14.0 million people."""
        },
        {
            "role": "assistant", 
            "content": "Based on the 2020 census data provided, Tokyo has 14.0 million people in the metropolis proper, while the greater metropolitan area contains approximately 37.4 million residents, making it the world's largest urban agglomeration."
        }
    ],
    criteria="Reward responses that accurately use the provided context and cite specific data points"
)

print(f"Score: {result.score}")
print(f"Explanation: {result.explanation}")

Intro

Use cases

Guides

FAQs

Accuracy Evaluation