Prompt Caching Break-Even Calculator for Claude

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

The 5-minute cache on Claude breaks even after exactly 1 cache read. The 1-hour cache breaks even after 2 reads. These are not approximations – they are provable from the pricing multipliers: 1.25x write + 0.1x read for 5-minute, 2.0x write + 0.1x read for 1-hour.

The Setup

You are evaluating whether to enable prompt caching on your Claude API integration. Your system prompt is 20,000 tokens and you make between 10 and 500 requests per day on Sonnet 4.6. You need to know the exact point where caching starts saving money and how much you save beyond that point.

Without caching, you spend $1.20/day at 200 requests (200 x 20K x $3.00/MTok). With caching, you spend $0.134/day. But you need the formula to calculate this for your specific workload, model, and prompt size.

The Math

Break-even formula for 5-minute cache:

Let N = number of requests within a 5-minute window.

Break-even when cached cost < uncached cost:

Result: 2 total requests (1 write + 1 read) breaks even.

Break-even formula for 1-hour cache:

Result: 3 total requests (1 write + 2 reads) breaks even.

Savings at scale (Sonnet 4.6, 20K tokens):

Requests/day No cache cost 5-min cached Savings
10 $0.60 $0.129 $0.47 (78%)
100 $6.00 $0.669 $5.33 (89%)
1,000 $60.00 $6.07 $53.93 (90%)

The Technique

Here is a Python calculator that computes break-even and projected savings for any configuration:

def cache_calculator(
    model_input_price_per_mtok: float,
    cached_tokens: int,
    requests_per_window: int,
    cache_ttl: str = "5m",
    windows_per_day: int = 1
) -> dict:
    """Calculate caching costs and break-even points.

    Args:
        model_input_price_per_mtok: Base input price (e.g., 5.00 for Opus 4.7)
        cached_tokens: Number of tokens in the cached prefix
        requests_per_window: Requests within one cache TTL window
        cache_ttl: "5m" or "1h"
        windows_per_day: How many cache windows occur per day
    """
    write_multiplier = 1.25 if cache_ttl == "5m" else 2.0
    read_multiplier = 0.1

    base_price = model_input_price_per_mtok / 1_000_000

    # Cost without caching
    uncached_per_window = requests_per_window * cached_tokens * base_price

    # Cost with caching (1 write + N-1 reads per window)
    write_cost = cached_tokens * base_price * write_multiplier
    read_cost = max(0, requests_per_window - 1) * cached_tokens * base_price * read_multiplier
    cached_per_window = write_cost + read_cost

    # Daily costs
    uncached_daily = uncached_per_window * windows_per_day
    cached_daily = cached_per_window * windows_per_day
    savings_daily = uncached_daily - cached_daily

    # Break-even
    breakeven_n = (write_multiplier - 0.1) / 0.9
    breakeven_requests = int(breakeven_n) + 1  # Round up

    return {
        "uncached_daily": f"${uncached_daily:.2f}",
        "cached_daily": f"${cached_daily:.2f}",
        "savings_daily": f"${savings_daily:.2f}",
        "savings_pct": f"{(savings_daily/uncached_daily)*100:.1f}%",
        "breakeven_requests": breakeven_requests,
        "monthly_savings": f"${savings_daily * 30:.2f}"
    }

# Example: Sonnet 4.6, 50K system prompt, 200 requests in 5-min windows
result = cache_calculator(
    model_input_price_per_mtok=3.00,
    cached_tokens=50_000,
    requests_per_window=200,
    cache_ttl="5m",
    windows_per_day=12  # Active for ~1 hour with 5-min windows
)
print(result)
# {'uncached_daily': '$360.00', 'cached_daily': '$38.07',
#  'savings_daily': '$321.93', 'savings_pct': '89.4%',
#  'breakeven_requests': 2, 'monthly_savings': '$9657.90'}

For a quick command-line check without Python:

# Quick break-even check: does caching save money?
# Inputs: base_price_per_mtok, cached_tokens, requests_per_window
python3 -c "
base=5.00; tokens=100000; reqs=5; ttl='5m'
w = 1.25 if ttl=='5m' else 2.0
uncached = reqs * tokens * base / 1e6
cached = tokens * base * w / 1e6 + (reqs-1) * tokens * base * 0.1 / 1e6
print(f'Uncached: \${uncached:.3f}')
print(f'Cached: \${cached:.3f}')
print(f'Savings: \${uncached - cached:.3f} ({(uncached-cached)/uncached*100:.0f}%)')
"

The calculator handles edge cases: if requests_per_window is 1, the cached cost equals the write cost (1.25x or 2.0x base), which is more expensive than not caching. The tool correctly flags this as a net loss.

The Tradeoffs

The break-even math assumes ideal conditions that may not hold:

Implementation Checklist

  1. Run the calculator with your actual model, prompt size, and request volume
  2. Verify your prompt exceeds the minimum cacheable token threshold
  3. Confirm requests per cache window exceeds the break-even count (2 for 5-min, 3 for 1-hour)
  4. Implement caching with monitoring on cache_read_input_tokens
  5. Compare predicted savings from the calculator against actual API billing after one week
  6. Re-run the calculator monthly as request volumes change

Measuring Impact

Validate your calculator predictions against reality: