Prompt Caching Break-Even Calculator for Claude
The 5-minute cache on Claude breaks even after exactly 1 cache read. The 1-hour cache breaks even after 2 reads. These are not approximations – they are provable from the pricing multipliers: 1.25x write + 0.1x read for 5-minute, 2.0x write + 0.1x read for 1-hour.
The Setup
You are evaluating whether to enable prompt caching on your Claude API integration. Your system prompt is 20,000 tokens and you make between 10 and 500 requests per day on Sonnet 4.6. You need to know the exact point where caching starts saving money and how much you save beyond that point.
Without caching, you spend $1.20/day at 200 requests (200 x 20K x $3.00/MTok). With caching, you spend $0.134/day. But you need the formula to calculate this for your specific workload, model, and prompt size.
The Math
Break-even formula for 5-minute cache:
Let N = number of requests within a 5-minute window.
- Cost without caching: N x tokens x base_price
- Cost with caching: (1 x tokens x 1.25 x base_price) + ((N-1) x tokens x 0.1 x base_price)
Break-even when cached cost < uncached cost:
- 1.25 + 0.1(N-1) < N
- 1.25 + 0.1N - 0.1 < N
- 1.15 < 0.9N
- N > 1.28
Result: 2 total requests (1 write + 1 read) breaks even.
Break-even formula for 1-hour cache:
- 2.0 + 0.1(N-1) < N
- 1.9 < 0.9N
- N > 2.11
Result: 3 total requests (1 write + 2 reads) breaks even.
Savings at scale (Sonnet 4.6, 20K tokens):
| Requests/day | No cache cost | 5-min cached | Savings |
|---|---|---|---|
| 10 | $0.60 | $0.129 | $0.47 (78%) |
| 100 | $6.00 | $0.669 | $5.33 (89%) |
| 1,000 | $60.00 | $6.07 | $53.93 (90%) |
The Technique
Here is a Python calculator that computes break-even and projected savings for any configuration:
def cache_calculator(
model_input_price_per_mtok: float,
cached_tokens: int,
requests_per_window: int,
cache_ttl: str = "5m",
windows_per_day: int = 1
) -> dict:
"""Calculate caching costs and break-even points.
Args:
model_input_price_per_mtok: Base input price (e.g., 5.00 for Opus 4.7)
cached_tokens: Number of tokens in the cached prefix
requests_per_window: Requests within one cache TTL window
cache_ttl: "5m" or "1h"
windows_per_day: How many cache windows occur per day
"""
write_multiplier = 1.25 if cache_ttl == "5m" else 2.0
read_multiplier = 0.1
base_price = model_input_price_per_mtok / 1_000_000
# Cost without caching
uncached_per_window = requests_per_window * cached_tokens * base_price
# Cost with caching (1 write + N-1 reads per window)
write_cost = cached_tokens * base_price * write_multiplier
read_cost = max(0, requests_per_window - 1) * cached_tokens * base_price * read_multiplier
cached_per_window = write_cost + read_cost
# Daily costs
uncached_daily = uncached_per_window * windows_per_day
cached_daily = cached_per_window * windows_per_day
savings_daily = uncached_daily - cached_daily
# Break-even
breakeven_n = (write_multiplier - 0.1) / 0.9
breakeven_requests = int(breakeven_n) + 1 # Round up
return {
"uncached_daily": f"${uncached_daily:.2f}",
"cached_daily": f"${cached_daily:.2f}",
"savings_daily": f"${savings_daily:.2f}",
"savings_pct": f"{(savings_daily/uncached_daily)*100:.1f}%",
"breakeven_requests": breakeven_requests,
"monthly_savings": f"${savings_daily * 30:.2f}"
}
# Example: Sonnet 4.6, 50K system prompt, 200 requests in 5-min windows
result = cache_calculator(
model_input_price_per_mtok=3.00,
cached_tokens=50_000,
requests_per_window=200,
cache_ttl="5m",
windows_per_day=12 # Active for ~1 hour with 5-min windows
)
print(result)
# {'uncached_daily': '$360.00', 'cached_daily': '$38.07',
# 'savings_daily': '$321.93', 'savings_pct': '89.4%',
# 'breakeven_requests': 2, 'monthly_savings': '$9657.90'}
For a quick command-line check without Python:
# Quick break-even check: does caching save money?
# Inputs: base_price_per_mtok, cached_tokens, requests_per_window
python3 -c "
base=5.00; tokens=100000; reqs=5; ttl='5m'
w = 1.25 if ttl=='5m' else 2.0
uncached = reqs * tokens * base / 1e6
cached = tokens * base * w / 1e6 + (reqs-1) * tokens * base * 0.1 / 1e6
print(f'Uncached: \${uncached:.3f}')
print(f'Cached: \${cached:.3f}')
print(f'Savings: \${uncached - cached:.3f} ({(uncached-cached)/uncached*100:.0f}%)')
"
The calculator handles edge cases: if requests_per_window is 1, the cached cost equals the write cost (1.25x or 2.0x base), which is more expensive than not caching. The tool correctly flags this as a net loss.
The Tradeoffs
The break-even math assumes ideal conditions that may not hold:
- Cache misses from prompt changes: If you update your system prompt mid-day, every cache entry is invalidated. Frequent prompt iteration during development makes caching counterproductive.
- Multi-workspace deployments: Caches are isolated per workspace since February 2026. Running the same prompt across 3 workspaces means 3 separate cache writes.
- Token counting uncertainty: The minimum cacheable tokens vary by model (4,096 for Opus 4.7, 1,024 for Sonnet 4.6). If your prompt is near the threshold, small edits might push it below and silently disable caching.
- Output tokens unaffected: Caching only reduces input costs. If your workload is output-heavy, the percentage savings on total cost will be smaller than the input savings suggest.
Implementation Checklist
- Run the calculator with your actual model, prompt size, and request volume
- Verify your prompt exceeds the minimum cacheable token threshold
- Confirm requests per cache window exceeds the break-even count (2 for 5-min, 3 for 1-hour)
- Implement caching with monitoring on
cache_read_input_tokens - Compare predicted savings from the calculator against actual API billing after one week
- Re-run the calculator monthly as request volumes change
Measuring Impact
Validate your calculator predictions against reality:
- Predicted vs actual daily cost: Should match within 10% if request patterns are stable
- Cache hit rate: Reads / (reads + writes) should match your requests_per_window assumption
- Monthly ROI: Track cumulative savings. At $4,045/month saved (the support bot example), caching pays for the engineering time in the first day
- Export Anthropic billing data weekly and compare against calculator projections