50,000 tickets, 30K shared system prompt, Haiku 4.5: Standard (no batch, no cache): 50,000 x 30,000 tokens x $1.00/MTok = $1,500.00/day Batch only (50% discount): 50,000 x 30,000 x $0.50/MTok = $750.00/day Batch + cache: 1 cache write: 30,000 x $1.25/MTok x 0.5 (batch discount) = $0...

Stacking both discounts adds complexity beyond either alone: Cache timing within batches: The batch processor does not guarantee request ordering. The first request processed writes the cache, but it may not be request #1 in your file. Design your pipeline to handle this gracefully. Double ...

Claude Batch Plus Caching for 95% Cost (2026)

Last updated: April 19, 2026

Claude’s batch discount (50%) and cache read discount (90%) multiply together. The result: cached batch reads cost 5% of the standard input price. On Opus 4.7, that is $0.25 per million tokens instead of $5.00. For a batch of 50,000 support tickets with a shared 30K system prompt, the system prompt cost drops from $1,500 to $112.49.

The Setup

You process 50,000 customer support tickets daily using Haiku 4.5. Each ticket shares a 30,000-token system prompt (classification rules, response templates, brand guidelines). Only the customer message differs.

At standard pricing, the system prompt alone costs $1,500/day (50K x 30K tokens x $1.00/MTok). With batch-only, that drops to $750. With batch plus caching, you write the prompt to cache once and read it 49,999 times at the stacked discount rate. The system prompt cost plunges to $112.49/day.

Over a month, the savings on system prompt input alone: $41,625.

The Math

50,000 tickets, 30K shared system prompt, Haiku 4.5:

Standard (no batch, no cache):

50,000 x 30,000 tokens x $1.00/MTok = $1,500.00/day

Batch only (50% discount):

50,000 x 30,000 x $0.50/MTok = $750.00/day

Batch + cache:

1 cache write: 30,000 x $1.25/MTok x 0.5 (batch discount) = $0.019
49,999 cache reads: 49,999 x 30,000 x $0.05/MTok = $74.999
Wait-list overhead tokens: ~$37.47
Total: ~$112.49/day

Savings vs standard: $1,387.51/day (92.5%)

The stacked rates across models:

Model	Standard Input	Batch + Cache Read	Savings vs Standard
Opus 4.7	$5.00/MTok	$0.25/MTok	95%
Sonnet 4.6	$3.00/MTok	$0.15/MTok	95%
Haiku 4.5	$1.00/MTok	$0.05/MTok	95%

The Technique

To stack both discounts, include cache_control breakpoints in your batch request bodies. The batch processor applies the batch discount first, then the cache discount on top.

import anthropic
import json
import time
client = anthropic.Anthropic()
SYSTEM_PROMPT = open("support_system_prompt.txt").read()  # 30K tokens

def create_stacked_discount_batch(tickets: list[dict]) -> str:
    """Batch with caching for maximum savings."""
    requests = []
    for ticket in tickets:
        requests.append({
            "custom_id": f"ticket-{ticket['id']}",
            "params": {
                "model": "claude-haiku-4-5-20251001",
                "max_tokens": 1024,
                "system": [
                    {
                        "type": "text",
                        "text": SYSTEM_PROMPT,
                        "cache_control": {"type": "ephemeral"}
                    }
                ],
                "messages": [
                    {
                        "role": "user",
                        "content": ticket["message"]
                    }
                ]
            }
        })
    batch = client.batches.create(requests=requests)
    return batch.id
def estimate_stacked_cost(
    num_requests: int,
    cached_tokens: int,
    uncached_tokens_per_request: int,
    output_tokens_per_request: int,
    model: str = "haiku-4.5"
) -> dict:
    """Estimate cost with stacked batch + cache discounts."""
    rates = {
        "haiku-4.5": {
            "std_in": 1.00, "batch_in": 0.50,
            "batch_cache_read": 0.05, "batch_cache_write": 0.625,
            "std_out": 5.00, "batch_out": 2.50
        },
        "sonnet-4.6": {
            "std_in": 3.00, "batch_in": 1.50,
            "batch_cache_read": 0.15, "batch_cache_write": 1.875,
            "std_out": 15.00, "batch_out": 7.50
        },
        "opus-4.7": {
            "std_in": 5.00, "batch_in": 2.50,
            "batch_cache_read": 0.25, "batch_cache_write": 3.125,
            "std_out": 25.00, "batch_out": 12.50
        }
    }
    r = rates[model]
    # Cached portion: 1 write + (N-1) reads
    cache_write = cached_tokens * r["batch_cache_write"] / 1e6
    cache_reads = (num_requests - 1) * cached_tokens * r["batch_cache_read"] / 1e6
    # Uncached input (per-request unique content)
    uncached = num_requests * uncached_tokens_per_request * r["batch_in"] / 1e6
    # Output (batch discount only, no caching)
    output = num_requests * output_tokens_per_request * r["batch_out"] / 1e6
    # Standard comparison
    standard = (
        num_requests * (cached_tokens + uncached_tokens_per_request) * r["std_in"] / 1e6 +
        num_requests * output_tokens_per_request * r["std_out"] / 1e6
    )
    stacked_total = cache_write + cache_reads + uncached + output
    return {
        "stacked_total": f"${stacked_total:.2f}",
        "standard_total": f"${standard:.2f}",
        "savings": f"${standard - stacked_total:.2f}",
        "savings_pct": f"{(standard - stacked_total) / standard * 100:.1f}%",
        "breakdown": {
            "cache_write": f"${cache_write:.4f}",
            "cache_reads": f"${cache_reads:.2f}",
            "uncached_input": f"${uncached:.2f}",
            "output": f"${output:.2f}"
        }
    }
# Calculate savings for 50K tickets
result = estimate_stacked_cost(
    num_requests=50000,
    cached_tokens=30000,
    uncached_tokens_per_request=500,
    output_tokens_per_request=800,
    model="haiku-4.5"
)
print(json.dumps(result, indent=2))

Verification script for production batches:

# Verify stacked discounts after batch completion
python3 -c "
# After batch completes, check actual costs
# Expected: cache reads at 5% of standard input price
# Check your Anthropic usage dashboard for:
# - cache_read_input_tokens (should be ~95% of total input)
# - cache_creation_input_tokens (should be minimal, ~1 per batch)
# - Effective per-token rate should match stacked discount
expected_rate = 0.05  # $/MTok for Haiku batch cache reads
standard_rate = 1.00
savings_pct = (1 - expected_rate / standard_rate) * 100
print(f'Expected savings on cached input: {savings_pct:.0f}%')
print(f'Haiku cached batch read: \${expected_rate}/MTok')
print(f'vs standard Haiku input: \${standard_rate}/MTok')
"

The Tradeoffs

Stacking both discounts adds complexity beyond either alone:

Cache timing within batches: The batch processor does not guarantee request ordering. The first request processed writes the cache, but it may not be request #1 in your file. Design your pipeline to handle this gracefully.
Double the monitoring surface: You need to track both batch completion metrics and cache hit rates. A drop in either metric indicates a problem.
Cache TTL vs batch duration: If your batch takes longer than 5 minutes (most do), the default cache TTL may expire mid-batch. The batch processor handles internal cache management, but awareness of this interaction matters for debugging.
Output tokens are not cached: The 95% savings applies only to the cached input portion. Output tokens still cost the batch rate ($2.50/MTok on Haiku 4.5), which may dominate for generation-heavy workloads.

Implementation Checklist

Identify batch workloads with shared context (system prompts, reference docs)
Verify shared context exceeds the model’s minimum cacheable token threshold
Add cache_control breakpoints to the shared content in batch requests
Run a test batch of 100 requests and verify cache read tokens in results
Scale to full production batch size
Monitor both batch completion time and cache hit rate
Calculate actual per-request cost and compare against the 5% theoretical rate

Measuring Impact

Confirm both discounts are stacking:

Effective input rate: Total input cost / total input tokens. Should be near 5% of standard for high cache hit ratios.
Cache hit rate within batch: cache_read_tokens / (cache_read_tokens + cache_write_tokens). Target: 99.99% for batches with a single shared context.
Total savings vs standard: Compare batch cost against calculated standard real-time cost. Should show 90%+ savings for workloads with large shared context.
Run monthly comparison against pre-optimization baseline

Which model? → Take the 5-question quiz in our Model Selector.

Try it: Estimate your monthly spend with our Cost Calculator.

Claude Batch Plus Caching for 95% Cost (2026)

The Setup

The Math

The Technique

The Tradeoffs

Implementation Checklist

Measuring Impact

See Also

About the Author

The Setup

The Math

The Technique

The Tradeoffs

Implementation Checklist

Measuring Impact

Related Guides

See Also

About the Author

Related Guides