Beta. The MCP server is a new surface; tool descriptions and behaviour may evolve as we learn how customers use it. If something isn’t working the way you’d expect, please tell us.
Introduction
Ask your AI assistant questions about your Composo evaluation data in plain English. Connect Claude Desktop, Claude Code, or any MCP-capable client to your account, and the model can pull criteria, tags, bucketed aggregates, and individual traces to answer them — no SQL, no dashboards, no per-question REST calls.When to Use It
- Ad-hoc analysis with an LLM: ask “what’s the average helpfulness score by agent over the last week?” or “did anything regress in the last month?” and let the model call the right tools, instead of clicking through dashboards or writing a query.
- Trace debugging: surface low-scoring or filter-narrowed examples directly into an LLM session for inspection.
- Embedding evaluation insight into your own app: any tool catalogue your agent already exposes via MCP can include these read-side surfaces.
Connect
Pick the client you’re using. The same API key works across all of them — generate one from the API Keys settings page. Either anAPI-Key header or Authorization: Bearer <key> is accepted.
Claude Desktop
Add an entry undermcpServers in your Claude Desktop config (Settings → Developer → Edit Config):
mcp CLI
Claude Code
claude sessions will load the server and surface its tools in-session.
Generic MCP client
Any MCP client that supports the Streamable HTTP transport can connect — point it athttps://platform.composo.ai/mcp and send API-Key (or Authorization: Bearer) on each request.
What’s Available
Once connected, your client sees six tools covering discovery, aggregation, and individual-trace browsing — no setup, just ask.list_criteria— discover the evaluation criteria seen in your domain.list_tag_keys/list_tag_values— discover the tag keys (e.g.agent,environment) and the distinct values used.get_insights— bucketed aggregates (avg, count, stddev, min, max) per criterion over an arbitrary filter set.get_grouped_insights— same, broken down by a tag value (by agent, by customer, etc.).list_traces— page through individual traces with their full content and nested evaluations, optionally filtered by date, tag, criterion, or score.
get_insights or get_grouped_insights; for individual examples use list_traces (paged 10 at a time).