Skip to main content
When you evaluate the same trace under multiple criteria, Composo reuses the shared part of the prompt instead of re-processing it every time — and bills those reused tokens at a reduced rate.

How it works

Most evaluation workloads send one trace, several criteria: the same agent conversation is scored against criterion A, then B, then C — each as a separate request. The criteria differ, but the trace is identical across all of them. Composo caches that shared trace:
  • The first evaluation of a trace primes the cache.
  • Any subsequent evaluation of the same trace within 30 minutes reuses it. In the one-trace-many-criteria pattern, that means every criterion after the first is cached.
The cache key is the trace itself. The per-request criterion is not part of it (that’s the thing that changes each call), so varying the criterion doesn’t break the cache — which is exactly why “one trace, N criteria” benefits.

What it costs

Cached trace tokens bill at 40% of the normal per-token rate — a 60% discount on the reused portion. Concretely, a request’s billable tokens are:
billable_tokens = tokens_used − 0.6 × cached_trace_tokens
So for a trace evaluated under 4 criteria within 30 minutes, the trace tokens of the 2nd, 3rd, and 4th calls are each discounted by 60%. Only the trace is discounted — the criterion and other per-request content always bill at the full rate.
Caching never changes your scores, your responses, or which requests succeed. It only affects how the reused trace tokens are priced. The first call on a trace is always billed in full (there’s nothing to reuse yet).

Getting the most out of it

  • Batch criteria for the same trace close together. Send the criteria for a given trace within the 30-minute window so the 2nd onward land on the cache.
  • Reuse traces verbatim. The cached trace must be byte-for-byte identical; any change to the messages starts a fresh trace.
  • It applies to the align-20260109 model core (the current default). Lightning variants don’t participate.

Seeing your caching

The /usage page has a Trace Caching card and a Cached tokens column in the per-model table:
  • Tokens reused — the share of your cacheable trace tokens that were served from cache.
  • Cache hit rate — the share of your evaluations that reused a cached trace.
  • Cached tokens (per-model table) — the trace tokens that earned the discount.
The credits shown already reflect the discount, so your Credits column is what you actually pay after caching.

Questions

Reach out to [email protected] — we’re happy to walk through your caching numbers or how to structure requests to benefit from it.