How Claude Cache Reads Cost $0.50 vs $5.00
Every time Claude reads from cache instead of processing raw input, you pay $0.50 per million tokens on Opus 4.7 instead of $5.00. That 10x price difference is the single largest cost reduction lever available in the Anthropic API, and most teams never activate it.
The Setup
You are building a document analysis pipeline. Each request sends a 100,000-token reference document plus a short query. You process 5 queries against the same document in sequence.
Without caching, you pay for 500,000 input tokens at full price. With caching, you pay for one cache write at 1.25x and four cache reads at 0.1x. The total drops from $2.50 to $0.825 – a 67% reduction on Opus 4.7. That is Anthropic’s own published example.
The Math
Anthropic’s verified example: 100K cached tokens, 5 requests, Opus 4.7
Without caching:
- 5 requests x 100,000 tokens x $5.00/MTok = $2.50
With 5-minute caching:
- 1 cache write: 100,000 tokens x $6.25/MTok = $0.625
- 4 cache reads: 4 x 100,000 tokens x $0.50/MTok = $0.20
- Total: $0.825
Savings: $1.675 (67%)
The pricing structure across all models follows the same 0.1x multiplier for reads:
| Model | Standard Input | Cache Read | Savings |
|---|---|---|---|
| Opus 4.7 | $5.00/MTok | $0.50/MTok | 90% |
| Sonnet 4.6 | $3.00/MTok | $0.30/MTok | 90% |
| Haiku 4.5 | $1.00/MTok | $0.10/MTok | 90% |
The Technique
The key to maximizing cache reads is structuring your requests so the cacheable prefix remains identical across calls. Any change in the cached prefix – even a single character – invalidates the cache and triggers a new write.
import anthropic
import json
client = anthropic.Anthropic()
def analyze_document(document: str, queries: list[str]) -> list[str]:
"""Send multiple queries against the same cached document."""
results = []
for query in queries:
response = client.messages.create(
model="claude-opus-4-7-20250415",
max_tokens=2048,
system=[
{
"type": "text",
"text": document, # This gets cached after first call
"cache_control": {"type": "ephemeral"}
}
],
messages=[
{"role": "user", "content": query}
]
)
usage = response.usage
cost_type = "WRITE" if usage.cache_creation_input_tokens > 0 else "READ"
cached_tokens = (
usage.cache_read_input_tokens or
usage.cache_creation_input_tokens
)
print(f"[{cost_type}] Cached: {cached_tokens} tokens")
results.append(response.content[0].text)
return results
# Usage
document = open("contract.txt").read() # 100K tokens
queries = [
"What are the payment terms?",
"List all liability clauses.",
"Summarize termination conditions.",
"What are the renewal options?",
"Identify any non-compete provisions."
]
answers = analyze_document(document, queries)
The first call triggers a cache write at $6.25/MTok for Opus 4.7. The remaining four calls hit the cache at $0.50/MTok. Total input cost for 100K tokens across 5 queries: $0.825 instead of $2.50.
Three rules for maximizing cache read ratio:
- Put static content first. The cache is a prefix cache. Only content at the beginning of the message sequence, up to a
cache_controlbreakpoint, gets cached. - Keep the prefix byte-identical. Do not inject timestamps, request IDs, or other per-request values into cached content.
- Pace your requests. The default TTL is 5 minutes, refreshed on each hit. If you have bursts with gaps exceeding 5 minutes, consider the 1-hour cache (2.0x write cost, same 0.1x read cost).
The Tradeoffs
Cache reads deliver 90% savings only when the cached content stays identical. These scenarios reduce or eliminate the benefit:
- Dynamic system prompts: If you inject user-specific data into the system prompt, each variation creates a separate cache entry. The write cost accumulates with no reads.
- Infrequent access: With the default 5-minute TTL, a prompt used once per hour gets written 12 times a day with zero reads. Each write costs 1.25x base, making it 25% more expensive than no caching at all.
- Short prompts on Opus/Haiku: The 4,096-token minimum means prompts under that threshold cannot be cached regardless of frequency.
For maximum savings, combine cache reads with batch processing. Batch mode halves the base input price (Opus 4.7 drops from $5.00 to $2.50/MTok), and cache reads are 10% of the base price ($0.50/MTok standard, $0.25/MTok batch). This combination achieves a 95% reduction compared to uncached standard pricing. For 100M cached input tokens per month on batch Opus, the cost drops from $500.00 (uncached standard) to $25.00 (batch + cache read). The prerequisite is that your workload can tolerate the up-to-24-hour batch processing window and that your cached prefix remains identical across batch requests.
Implementation Checklist
- Audit your API calls for repeated input content longer than your model’s minimum cache threshold
- Separate static content (documentation, rules, examples) from dynamic content (user queries)
- Place all static content in the system message with a
cache_controlbreakpoint - Log
cache_read_input_tokensandcache_creation_input_tokensfrom every response - Alert if cache read ratio drops below 90% during business hours
- Review cache write frequency weekly to identify wasted writes
Measuring Impact
Compare your input token costs for the week before and after enabling caching:
- Target metric: Cache read ratio above 90% (reads / total cached interactions)
- Dollar validation: Input cost per 1,000 requests should drop by 80-90%
- Anomaly detection: A sudden spike in
cache_creation_input_tokensmeans your prefix is changing unexpectedly - Export usage data from the Anthropic console and calculate effective input price per million tokens
Calculate your effective per-token rate as a sanity check: divide your total daily input spend by total input tokens processed. With healthy caching on Opus 4.7, this number should be well below $1.00/MTok (closer to $0.50-$0.70/MTok depending on cache hit rate). If it is above $3.00/MTok, your caching is not activating properly – investigate minimum token thresholds and prefix consistency.