Introduction
Composo provides a hosted Metabase instance where you can explore and visualize your LLM evaluation data. Query your historical evaluation runs, track quality metrics over time, and build dashboards to monitor your AI applications in development and production. Getting Started: Metabase access requires onboarding. Please email [email protected] or contact your Composo rep to get set up with your evaluation database.
What is Metabase?
Metabase is an open-source business intelligence tool that lets you ask questions about your data and visualize the answers. No SQL required for basic queries, though it’s available when you need it. For comprehensive Metabase documentation, see: Metabase DocumentationYour Data in Composo
Your Evaluation Database
Your evaluation data is organized in a dedicated database that you can explore and query. The database contains your complete evaluation history with detailed metrics and metadata for each run.
- Request ID: Unique identifier for each evaluation request (UUID)
- Agent Instance ID: Identifier for the specific agent instance being evaluated (null for response/tool evaluations)
- Eval Type: Type of evaluation -
response
(LLM responses),tool
(tool usage),agent
(multi-agent traces), orchatsession
(chat-based agent evaluations) - Score Type: How the score should be interpreted -
reward
(continuous 0-1 score) orbinary
(pass/fail converted to 1.0/0.0) - Name: Agent name for multi-agent evaluations (null for response/tool evaluations)
- Criteria: Full evaluation criteria text (starts with prefixes like “Reward responses”, “Agent passes if”, etc.)
- Score: Numerical result (0-1 scale, where higher is better; null if criteria not applicable)
- Explanation: Detailed reasoning and analysis behind the score
- Subject: JSON data containing what was evaluated:
- For response/tool evaluations:
{messages, tools, system}
- the conversation and available tools - For agent evaluations: The specific agent instance interactions being evaluated
- For response/tool evaluations:
- Email: User who ran the evaluation
- Model Class: The evaluation model used (e.g., “align-lightning”)
- Created At: Timestamp when the evaluation was performed

Viewing Individual Evaluations
Click any row in your queries to see complete evaluation details including the full explanation, criteria, and subject data. This gives you full visibility into how each evaluation was scored and the reasoning behind it.
Collections
- Your personal collection: Private workspace for your analyses
- Team collections: Shared dashboards and queries (e.g., “Acme Corp Collection”)

Creating Your First Query
Basic Query: Finding Red Flags
Let’s find low-scoring evaluations that need attention.- Click + New → Question
- Select your Evaluations table
- Click Filter → Score → Less than → enter
0.5
- Click Filter again → Created At → select your time range
- Click Visualize


Visualizing Your Data
Choosing a Visualization
After running a query, Metabase automatically suggests visualizations. Common types for evaluation data:- Line charts: Track score trends over time
- Bar charts: Compare different agents or evaluation types
- Tables: See detailed row-by-row data
- Numbers: Display single metrics like average score or red flag rate

Summarizing Data
Aggregations and Grouping
Instead of viewing raw rows, you can summarize your data:- Click Summarize
- Choose a metric: Count of rows, Average of Score, etc.
- Add Group by: Created At (for time series) or Name (to compare evaluations)
- Average Score by Created At → See quality trends over time
- Count by Name → Which evaluations run most frequently
- Average Score by Agent Instance ID → Compare agent performance
Custom Expressions: Red Flag Rate
Create a custom metric to calculate the percentage of low-scoring evaluations:- Click Summarize → Custom Expression
- Enter:
- Name it “red_flag_rate”
- Group by Created At: Minute (or Hour/Day)

Building Dashboards
Creating a Dashboard
Save your most important queries and combine them into dashboards:- After creating a query, click Save and give it a descriptive name
- Click + New → Dashboard
- Name your dashboard (e.g., “Production Quality Monitor”)
- Click Add a saved question and select your queries
- Resize and arrange charts as needed

Dashboard Features
- Tabs: Organize related metrics (e.g., “Quality By Agent” vs “Red Flags”)
- Dashboard filters: Add filters that apply to multiple charts simultaneously
- Auto-refresh: Set dashboards to update automatically every few minutes
- Sharing: Click the sharing icon to share with teammates or generate public links
Advanced Filtering
Combine multiple filters to drill down into your data:- Score ranges: Score is between 0.3 and 0.7
- Text search: Criteria contains “hallucination”
- Multiple time ranges: Created At is Previous 7 days AND Created At Hour of day is between 9 and 17
- Specific agents: Agent Instance ID is one of [list of IDs]
SQL Queries (Advanced)
For complex queries, use the native SQL editor:- Click + New → Question → Native query
- Write your SQL against the
evaluations
table - Use variables with
{{variable_name}}
to make queries reusable
Getting Help
Metabase Resources
Composo Support
- Data questions: Contact your Composo account team
- Technical support: [email protected]
- Evaluation schema: See reference below