Composo home page
Search...
⌘K
Support
Composo
Composo
Search...
Navigation
FAQs
FAQs
Overview
REST API Reference
SDK
Intro
Intro to Composo
Quickstart
Use cases
Agent Evaluation
RAG Evaluation
Response Quality Evaluation
Guides
How to write criteria
Criteria Library
Anonymization
Composo & Langfuse
Incorporating ground truths
FAQs
FAQs
On this page
Should I include system messages when evaluating with Composo?
What’s the context limit?
What’s the expected response time?
Can I run parallel requests?
What are the rate limits?
What languages are supported?
What’s the difference between reward and binary evaluation?
Can I evaluate tool calls and agents, not just responses?
How deterministic are the evaluation scores?
FAQs
FAQs
Copy page
Copy page
Should I include system messages when evaluating with Composo?
Including system messages is optional but recommended, as they provide useful context that can improve evaluation accuracy.
What’s the context limit?
200k tokens
What’s the expected response time?
Composo Align (flagship model):
5-15 seconds per API call
Composo Lightning:
3 seconds per API call
Can I run parallel requests?
Yes, we recommend limiting to 5 parallel API calls for optimal performance
What are the rate limits?
Free plan:
500 requests per hour
Paid plans:
Higher limits based on your specific plan
What languages are supported?
Our evaluation models support all major languages plus code. A good rule of thumb is that if you don’t need a specialized model to deal with your language, we can handle it.
What’s the difference between reward and binary evaluation?
Reward evaluation:
Returns a continuous score from 0-1 measuring how well the output meets your criteria
Binary evaluation:
Returns a simple pass/fail result for clear-cut criteria or policy compliance
Can I evaluate tool calls and agents, not just responses?
Yes! Composo evaluates three types of outputs:
Responses:
The assistant’s latest response
Tool calls:
Individual tool call parameters and selection
Agents:
Complete end-to-end agent traces
How deterministic are the evaluation scores?
Composo provides <1% variance in scores - the same input will always produce the same output, unlike LLM-as-judge approaches which have >30% variance.
Incorporating ground truths
Assistant
Responses are generated using AI and may contain mistakes.