Trace caching

How it works

Most evaluation workloads send one trace, several criteria: the same agent conversation is scored against criterion A, then B, then C — each as a separate request. The criteria differ, but the trace is identical across all of them.

Composo caches that shared trace:

The first evaluation of a trace primes the cache.

Any subsequent evaluation of the same trace within 30 minutes reuses it. In the one-trace-many-criteria pattern, that means every criterion after the first is cached.

The cache key is the trace itself. The per-request criterion is not part of it (that’s the thing that changes each call), so varying the criterion doesn’t break the cache — which is exactly why “one trace, N criteria” benefits.

What it costs

Cached trace tokens bill at 40% of the normal per-token rate — a 60% discount on the reused portion.

Concretely, a request’s billable tokens are:

billable_tokens = tokens_used − 0.6 × cached_trace_tokens

So for a trace evaluated under 4 criteria within 30 minutes, the trace tokens of the 2nd, 3rd, and 4th calls are each discounted by 60%. Only the trace is discounted — the criterion and other per-request content always bill at the full rate.

Caching never changes your scores, your responses, or which requests succeed. It only affects how the reused trace tokens are priced. The first call on a trace is always billed in full (there’s nothing to reuse yet).

Getting the most out of it

Batch criteria for the same trace close together. Send the criteria for a given trace within the 30-minute window so the 2nd onward land on the cache.

Reuse traces verbatim. The cached trace must be byte-for-byte identical; any change to the messages starts a fresh trace.

It applies to the align-20260109 model core (the current default). Lightning variants don’t participate.

Seeing your caching

The /usage page has a Trace Caching card and a Cached tokens column in the per-model table:

Tokens reused — the share of your cacheable trace tokens that were served from cache.

Cache hit rate — the share of your evaluations that reused a cached trace.

Cached tokens (per-model table) — the trace tokens that earned the discount.

The credits shown already reflect the discount, so your Credits column is what you actually pay after caching.

Getting Started

Criteria Guide

Testing

Monitoring

Cookbooks

Community Examples

Billing

How it works

What it costs

Getting the most out of it

Seeing your caching

Questions

​How it works

​What it costs

​Getting the most out of it

​Seeing your caching

​Questions

How it works

What it costs

Getting the most out of it

Seeing your caching

Questions