Without caching (Sonnet 4.6, 50K system prompt, 1,000 calls/day): Daily input cost: 1,000 calls x 50,000 tokens x $3.00/MTok = $150.00/day Monthly: $4,500.00 With 5-minute caching: 1 cache write: 50,000 tokens x $3.75/MTok = $0.19 999 cache reads: 999 x 50,000 tokens x $0.30/MTok = ...

Caching does not help in these situations: Unique prompts: If every request has a completely different system prompt, there is nothing to cache. The 1.25x write cost becomes pure overhead. Low-volume endpoints: If you make fewer than 2 requests within a 5-minute window, the cache expires be...

Claude Prompt Caching Saves 90% (2026)

Last updated: April 19, 2026

A single cached read on Claude Opus 4.7 costs $0.50 per million tokens. Without caching, that same input costs $5.00 per million tokens. That is a 90% reduction on every repeated input token, and it requires roughly five lines of code to activate.

The Setup

You run a customer support bot powered by Claude Sonnet 4.6. Your system prompt is 50,000 tokens of product documentation, tone guidelines, and few-shot examples. Every customer conversation sends that full prompt to the API.

At 1,000 conversations per day, you are paying $150.00 daily just for the system prompt input tokens alone. That is $4,500 per month before you count the actual customer messages or output tokens.

With prompt caching, that same system prompt gets written to cache once and read 999 times at 10% of the base price. Your daily system prompt cost drops to $15.17. Monthly savings: $4,045.

The Math

Without caching (Sonnet 4.6, 50K system prompt, 1,000 calls/day):

Daily input cost: 1,000 calls x 50,000 tokens x $3.00/MTok = $150.00/day
Monthly: $4,500.00

With 5-minute caching:

1 cache write: 50,000 tokens x $3.75/MTok = $0.19
999 cache reads: 999 x 50,000 tokens x $0.30/MTok = $14.99
Daily total: $15.17
Monthly: $455.00

Savings: $4,045 per month (89.9%)

The cache write costs 1.25x the base input price ($3.75 vs $3.00 per MTok for Sonnet 4.6). But each cache read costs just 0.1x the base price ($0.30 vs $3.00). The write pays for itself after a single read.

The Technique

Prompt caching works by adding cache_control breakpoints to your message content. The API caches everything up to that breakpoint and serves it from memory on subsequent requests.

Here is a Python implementation using the Anthropic SDK:

import anthropic
client = anthropic.Anthropic()
# Your large system prompt (must be >= 1,024 tokens for Sonnet 4.6)
system_prompt = open("system_prompt.txt").read()
def query_with_cache(user_message: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-6-20250929",
        max_tokens=4096,
        system=[
            {
                "type": "text",
                "text": system_prompt,
                "cache_control": {"type": "ephemeral"}
            }
        ],
        messages=[
            {"role": "user", "content": user_message}
        ]
    )
    # Check cache performance in response headers
    usage = response.usage
    print(f"Cache read tokens: {usage.cache_read_input_tokens}")
    print(f"Cache write tokens: {usage.cache_creation_input_tokens}")
    return response.content[0].text

The cache_control with type: "ephemeral" sets a 5-minute TTL that refreshes automatically each time the cache is hit. For workloads with gaps longer than 5 minutes between calls, use the 1-hour cache by specifying the appropriate header, though this costs 2.0x the base input price for writes ($6.00/MTok on Sonnet 4.6).

Key implementation details:

Place cache_control breakpoints at the end of your static content sections
You can set up to 4 breakpoints per request
The content before each breakpoint gets cached as a prefix
Minimum cacheable size: 1,024 tokens for Sonnet 4.6, 4,096 tokens for Opus 4.7 and Haiku 4.5

Monitor your cache hit rate by checking cache_read_input_tokens vs cache_creation_input_tokens in every response. A healthy system should show cache reads on 95%+ of requests.

The Tradeoffs

Caching does not help in these situations:

Unique prompts: If every request has a completely different system prompt, there is nothing to cache. The 1.25x write cost becomes pure overhead.
Low-volume endpoints: If you make fewer than 2 requests within a 5-minute window, the cache expires before it pays off.
Small prompts: Below the minimum token threshold (1,024 for Sonnet, 4,096 for Opus), caching simply does not activate.
Cache isolation: Since February 2026, caches are isolated per workspace, not per organization. A prompt cached in one workspace cannot be read from another.

For workloads with gaps between requests exceeding 5 minutes, consider the 1-hour cache TTL. It costs 2.0x the base input price for writes ($6.00/MTok on Sonnet 4.6 vs $3.75/MTok for the 5-minute cache) but survives idle periods up to 60 minutes. The break-even point: if more than 20% of your inter-request gaps exceed 5 minutes, the 1-hour cache saves more overall. On Opus 4.7, the 1-hour cache write costs $10.00/MTok vs $6.25/MTok for 5-minute, but a single avoided re-write of 50K tokens saves $0.3125 ($6.25/MTok x 50K) – recovered with just 1 additional cache read at $0.025.

You can also stack caching with batch processing for the absolute lowest cost. Batch processing halves the base rate, and cache reads are 10% of the base rate. On batch Opus 4.7, cached reads cost just $0.25/MTok – 95% less than standard uncached input at $5.00/MTok. A pipeline processing 50M cached input tokens monthly drops from $250.00 (standard uncached) to $12.50 (batch + cache read).

Implementation Checklist

Identify your largest static prompt components (system prompts, few-shot examples, documentation)
Verify the static content exceeds the minimum cacheable token count for your model
Add cache_control breakpoints to the end of each static content block
Deploy and monitor cache_read_input_tokens in response usage data
Calculate your actual cache hit rate: reads / (reads + writes)
If hit rate is below 80%, investigate request timing patterns
Consider 1-hour cache if requests are spaced more than 5 minutes apart

Measuring Impact

Track these metrics before and after enabling caching:

Cache hit rate: cache_read_input_tokens / total_input_tokens per hour
Cost per request: Compare your daily API bill divided by request count
Break-even verification: Confirm your write-to-read ratio exceeds 1:1 for 5-minute cache or 1:2 for 1-hour cache
Use the Anthropic dashboard usage tab to compare week-over-week input token costs

Which model? → Take the 5-question quiz in our Model Selector.

Try it: Estimate your monthly spend with our Cost Calculator.

Claude Prompt Caching Saves 90% (2026)

The Setup

The Math

The Technique

The Tradeoffs

Implementation Checklist

Measuring Impact

See Also

About the Author

The Setup

The Math

The Technique

The Tradeoffs

Implementation Checklist

Measuring Impact

Related Guides

See Also

About the Author

Related Guides