Documentation Index
Fetch the complete documentation index at: https://docs.composo.ai/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Composo runs a public Model Context Protocol server so you can query your evaluation data — criteria, tags, bucketed aggregates, and individual traces — from any MCP-capable LLM client (Claude Desktop, Claude Code, themcp CLI, custom apps using the official SDK).
The server is read-only and tenant-scoped: tools see only your own domain’s data, derived automatically from the API key you connect with. There is no “domain” or “customer” argument on any tool.
When to Use It
- Ad-hoc analysis with an LLM: ask Claude “what’s the average helpfulness score by agent over the last week?” and let it call the right tool, instead of clicking through dashboards or writing a query.
- Trace debugging: surface low-scoring or filter-narrowed examples directly into an LLM session for inspection.
- Embedding evaluation insight into your own app: any tool catalogue your agent already exposes via MCP can include these read-side surfaces.
Connect
The production MCP endpoint is:API-Key header or as Authorization: Bearer <key> — both are accepted, so client schemas that only support Bearer-style auth still work.
Claude Desktop
Add an entry undermcpServers in your Claude Desktop config (Settings → Developer → Edit Config):
mcp CLI
Claude Code
claude sessions will load the server and surface its tools in-session.
Generic MCP client
Any MCP client that supports the Streamable HTTP transport can connect — point it athttps://platform.composo.ai/mcp and send API-Key (or Authorization: Bearer) on each request.
What’s Available
The server exposes six read-only tools covering discovery, aggregation, and individual-trace browsing. The tool catalogue and per-tool descriptions are served by the server itself — your LLM client sees them automatically on connect, and they cannot drift from what the server actually implements. At a glance:list_criteria— discover the evaluation criteria seen in your domain.list_tag_keys/list_tag_values— discover the tag keys (e.g.agent,environment) and the distinct values used.get_insights— bucketed aggregates (avg, count, stddev, min, max) per criterion over an arbitrary filter set.get_grouped_insights— same, broken down by a tag value (by agent, by customer, etc.).list_traces— page through individual traces with their full content and nested evaluations, optionally filtered by date, tag, criterion, or score.
filters object across the filter-taking tools: date range, score range, criteria list, and tag filters (a key plus a list of allowed values). list_traces is hard-capped at 10 rows per page — use the aggregate tools for population-level questions.