How it works
Most evaluation workloads send one trace, several criteria: the same agent conversation is scored against criterion A, then B, then C — each as a separate request. The criteria differ, but the trace is identical across all of them. Composo caches that shared trace:- The first evaluation of a trace primes the cache.
- Any subsequent evaluation of the same trace within 30 minutes reuses it. In the one-trace-many-criteria pattern, that means every criterion after the first is cached.
What it costs
Cached trace tokens bill at 40% of the normal per-token rate — a 60% discount on the reused portion. Concretely, a request’s billable tokens are:Caching never changes your scores, your responses, or which requests succeed. It
only affects how the reused trace tokens are priced. The first call on a trace is
always billed in full (there’s nothing to reuse yet).
Getting the most out of it
- Batch criteria for the same trace close together. Send the criteria for a given trace within the 30-minute window so the 2nd onward land on the cache.
- Reuse traces verbatim. The cached trace must be byte-for-byte identical; any change to the messages starts a fresh trace.
- It applies to the
align-20260109model core (the current default). Lightning variants don’t participate.
Seeing your caching
The/usage page has a Trace Caching card and a Cached tokens column in
the per-model table:
- Tokens reused — the share of your cacheable trace tokens that were served from cache.
- Cache hit rate — the share of your evaluations that reused a cached trace.
- Cached tokens (per-model table) — the trace tokens that earned the discount.