Claude Prompt Caching Pricing Guide

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Prompt caching reads cost 10% of the base input price. With a large system prompt reused across multiple requests, caching can reduce your total input token costs by up to 90%. This guide shows you how to calculate the savings.

Quick Fix

Cache reads cost 0.1x base input price. If you reuse the same prompt content 10+ times within 5 minutes, caching saves money immediately:

Model Base Input Cache Write (5m) Cache Read Savings per Read
Opus 4.6 $5/MTok $6.25/MTok $0.50/MTok 90%
Sonnet 4.6 $3/MTok $3.75/MTok $0.30/MTok 90%
Haiku 4.5 $1/MTok $1.25/MTok $0.10/MTok 90%

Full Solution

Cache Pricing Breakdown

There are two cache duration options with different write costs:

5-Minute Cache (Default)

1-Hour Cache

Pricing Table (per Million Tokens)

Model Base Input 5m Write 1h Write Cache Read Output
Claude Opus 4.6 $5.00 $6.25 $10.00 $0.50 $25.00
Claude Sonnet 4.6 $3.00 $3.75 $6.00 $0.30 $15.00
Claude Haiku 4.5 $1.00 $1.25 $2.00 $0.10 $5.00

Break-Even Calculation

The cache write costs more than a regular input. You need enough cache reads to recoup the write cost:

Formula: Break-even reads = cache_write_cost / (base_input_cost - cache_read_cost)

For 5-minute cache on Sonnet 4.6:

You break even after just 2 cache reads. Every read after that saves 90%.

Cost Savings Example

A chatbot with a 5,000-token system prompt handling 100 requests per 5-minute window on Sonnet 4.6:

Without caching:

With caching:

Savings: 79% ($1.50 vs $0.318)

ITPM Throughput Boost

Cache-read tokens do NOT count towards your Input Tokens Per Minute (ITPM) rate limit. With an 80% cache hit rate and a 2,000,000 ITPM limit:

This means caching not only saves money but gives you 5x more effective throughput.

Implementation

import anthropic

client = anthropic.Anthropic()

SYSTEM_PROMPT = """Your large, reusable system prompt here...
[4096+ tokens for Opus 4.6, 1024+ tokens for Sonnet 4.5]
"""

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    cache_control={"type": "ephemeral"},
    system=SYSTEM_PROMPT,
    messages=[{"role": "user", "content": "User question here"}]
)

# Monitor costs
usage = response.usage
print(f"Input tokens: {usage.input_tokens} @ base price")
print(f"Cache write: {usage.cache_creation_input_tokens} @ 1.25x")
print(f"Cache read: {usage.cache_read_input_tokens} @ 0.1x")

When to Use 1-Hour Cache

Use the 1-hour cache when requests are spaced more than 5 minutes apart:

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    cache_control={"type": "ephemeral", "ttl": "1h"},
    system=SYSTEM_PROMPT,
    messages=[{"role": "user", "content": "Question?"}]
)

The 1-hour write costs 2x base (vs 1.25x for 5-minute), so you need more reads to break even:

Batch API + Caching

The Batch API already gives 50% off input/output. Adding caching gives additional savings on the discounted price:

Use the 1-hour cache TTL with batches since batch processing can take up to an hour.

Prevention

  1. Start with automatic caching: Add cache_control: {"type": "ephemeral"} to every request with a reusable system prompt.
  2. Monitor cache metrics: Track cache_creation_input_tokens and cache_read_input_tokens in every response.
  3. Match TTL to usage pattern: Use 5-minute for real-time chat, 1-hour for batch processing.
  4. Right-size your model: Sonnet 4.5 has a 1,024 token minimum vs Opus 4.6’s 4,096 – smaller prompts can cache on Sonnet but not Opus.