Cache Pricing Breakdown There are two cache duration options with different write costs: 5-Minute Cache (Default) Write cost: 1.25x base input price Read cost: 0.1x base input price TTL refreshes on each cache hit 1-Hour Cache Write cost: 2x base input price Read cost: 0.1x base i...

Start with automatic caching: Add cache_control: {"type": "ephemeral"} to every request with a reusable system prompt. Monitor cache metrics: Track cache_creation_input_tokens and cache_read_input_tokens in every response. Match TTL to usage pattern: Use 5-minute for real-time chat, 1-hour fo...

Claude Code For Spot Instance — Complete Developer Guide Claude Prompt Caching Saves 90% on Input Costs Claude Caching for Multi-Turn Conversations Claude Prompt Caching Implementation Tutorial When NOT to Use Claude Prompt Caching Claude Batch Plus Caching for 95% Cost Savings Per-Ag...

Claude Prompt Caching Pricing Guide (2026)

Last updated: April 17, 2026

Prompt caching reads cost 10% of the base input price. With a large system prompt reused across multiple requests, caching can reduce your total input token costs by up to 90%. This guide shows you how to calculate the savings.

Quick Fix

Cache reads cost 0.1x base input price. If you reuse the same prompt content 10+ times within 5 minutes, caching saves money immediately:

Model	Base Input	Cache Write (5m)	Cache Read	Savings per Read
Opus 4.6	$5/MTok	$6.25/MTok	$0.50/MTok	90%
Sonnet 4.6	$3/MTok	$3.75/MTok	$0.30/MTok	90%
Haiku 4.5	$1/MTok	$1.25/MTok	$0.10/MTok	90%

Full Solution

Cache Pricing Breakdown

There are two cache duration options with different write costs:

5-Minute Cache (Default)

Write cost: 1.25x base input price
Read cost: 0.1x base input price
TTL refreshes on each cache hit

1-Hour Cache

Write cost: 2x base input price
Read cost: 0.1x base input price
Better for batch workloads with longer gaps between requests

Pricing Table (per Million Tokens)

Model	Base Input	5m Write	1h Write	Cache Read	Output
Claude Opus 4.6	$5.00	$6.25	$10.00	$0.50	$25.00
Claude Sonnet 4.6	$3.00	$3.75	$6.00	$0.30	$15.00
Claude Haiku 4.5	$1.00	$1.25	$2.00	$0.10	$5.00

Break-Even Calculation

The cache write costs more than a regular input. You need enough cache reads to recoup the write cost:

Formula: Break-even reads = cache_write_cost / (base_input_cost - cache_read_cost)

For 5-minute cache on Sonnet 4.6:

Write cost: $3.75 per MTok
Base input: $3.00 per MTok
Cache read: $0.30 per MTok
Break-even: $3.75 / ($3.00 - $0.30) = 1.39 reads

You break even after just 2 cache reads. Every read after that saves 90%.

Cost Savings Example

A chatbot with a 5,000-token system prompt handling 100 requests per 5-minute window on Sonnet 4.6:

Without caching:

100 requests x 5,000 tokens = 500,000 input tokens
Cost: 0.5 MTok x $3.00 = $1.50

With caching:

1 cache write: 5,000 tokens x $3.75/MTok = $0.019
99 cache reads: 495,000 tokens x $0.30/MTok = $0.149
100 uncached question tokens: ~50,000 tokens x $3.00/MTok = $0.15
Total: $0.318

Savings: 79% ($1.50 vs $0.318)

ITPM Throughput Boost

Cache-read tokens do NOT count towards your Input Tokens Per Minute (ITPM) rate limit. With an 80% cache hit rate and a 2,000,000 ITPM limit:

Effective throughput: 2,000,000 / (1 - 0.80) = 10,000,000 tokens per minute

This means caching not only saves money but gives you 5x more effective throughput.

Implementation

import anthropic
client = anthropic.Anthropic()
SYSTEM_PROMPT = """Your large, reusable system prompt here...
[4096+ tokens for Opus 4.6, 1024+ tokens for Sonnet 4.5]
"""
response = client.messages.create(
 model="claude-sonnet-4-6",
 max_tokens=1024,
 cache_control={"type": "ephemeral"},
 system=SYSTEM_PROMPT,
 messages=[{"role": "user", "content": "User question here"}]
)
# Monitor costs
usage = response.usage
print(f"Input tokens: {usage.input_tokens} @ base price")
print(f"Cache write: {usage.cache_creation_input_tokens} @ 1.25x")
print(f"Cache read: {usage.cache_read_input_tokens} @ 0.1x")

When to Use 1-Hour Cache

Use the 1-hour cache when requests are spaced more than 5 minutes apart:

response = client.messages.create(
 model="claude-sonnet-4-6",
 max_tokens=1024,
 cache_control={"type": "ephemeral", "ttl": "1h"},
 system=SYSTEM_PROMPT,
 messages=[{"role": "user", "content": "Question?"}]
)

The 1-hour write costs 2x base (vs 1.25x for 5-minute), so you need more reads to break even:

Break-even for 1-hour on Sonnet 4.6: $6.00 / ($3.00 - $0.30) = 2.22, so 3 reads.

Batch API + Caching

The Batch API already gives 50% off input/output. Adding caching gives additional savings on the discounted price:

Batch input for Sonnet 4.6: $1.50/MTok (50% off $3.00)
Batch + cache read: $0.15/MTok (10% of $1.50)
Total savings vs standard: 95%

Use the 1-hour cache TTL with batches since batch processing can take up to an hour.

Prevention

Start with automatic caching: Add cache_control: {"type": "ephemeral"} to every request with a reusable system prompt.
Monitor cache metrics: Track cache_creation_input_tokens and cache_read_input_tokens in every response.
Match TTL to usage pattern: Use 5-minute for real-time chat, 1-hour for batch processing.
Right-size your model: Sonnet 4.5 has a 1,024 token minimum vs Opus 4.6’s 4,096 – smaller prompts can cache on Sonnet but not Opus.

Claude Code is expensive because it's reading your entire codebase every time. A CLAUDE.md tells it what matters upfront — architecture, conventions, boundaries. Less scanning. Fewer wrong turns. Lower bills. I spend $200+/month on Claude subs. These configs are how I keep the output worth the cost. **[Get the configs →](https://zovo.one/lifetime?utm_source=ccg&utm_medium=cta-perf&utm_campaign=claude-prompt-caching-pricing-and-cost-savings)** $99 once. Pays for itself in saved tokens within a week.

Which model? → Take the 5-question quiz in our Model Selector.

Try it: Estimate your monthly spend with our Cost Calculator.

Claude Prompt Caching API Guide – implementation guide with code examples.
Claude Prompt Caching Not Working – troubleshoot silent caching failures.
Claude API Error 429 rate_limit_error Fix – caching boosts your effective ITPM throughput.
Claude Streaming API Guide – combine streaming with caching for optimal performance.
Claude Python SDK Getting Started – basic SDK setup before implementing caching.