Opus 4.7, 3,000-token prompt (below 4,096 minimum), 500 requests/day: Attempted with caching (but below threshold – caching silently disabled): 500 x 3,000 tokens x $5.00/MTok = $7.50/day = $225/month Same prompt on Sonnet 4.6 (above 1,024 minimum – caching works): 1 write: 3,000 x $3.75...

Working around minimum token requirements involves compromises: Padding prompts risks quality: Adding filler content to reach 4,096 tokens can confuse the model. Only add contextually relevant material. Downgrading models risks accuracy: Sonnet 4.6 has a lower threshold but may produce lowe...

Claude Cache Minimum Token Requirements (2026)

Last updated: April 19, 2026

Claude prompt caching silently fails if your content falls below the minimum token threshold. Opus 4.7 requires 4,096 tokens. Sonnet 4.6 requires 1,024 tokens. Below these limits, your cache_control breakpoints are ignored, and you pay the full $5.00/MTok or $3.00/MTok on every request with zero savings.

The Setup

You have a 3,000-token system prompt running on Opus 4.7 with 500 requests per day. You add cache_control breakpoints expecting to save 90% on input costs. But 3,000 tokens is below the 4,096-token minimum for Opus. The caching never activates. You pay full price on all 500 requests – $7.50/day instead of the $0.88 you expected.

The fix is either to pad the prompt above the threshold (adding useful context, not filler) or to switch to Sonnet 4.6, which has a 1,024-token minimum. On Sonnet, the same 3,000-token prompt caches successfully, and 500 requests/day drops from $4.50 to $0.53.

The Math

Opus 4.7, 3,000-token prompt (below 4,096 minimum), 500 requests/day:

Attempted with caching (but below threshold – caching silently disabled):

500 x 3,000 tokens x $5.00/MTok = $7.50/day = $225/month

Same prompt on Sonnet 4.6 (above 1,024 minimum – caching works):

1 write: 3,000 x $3.75/MTok = $0.011
499 reads: 499 x 3,000 x $0.30/MTok = $0.449
Total: $0.46/day = $13.80/month

Savings from choosing the right model: $211.20/month

Full minimum token requirements table:

Model	Minimum Cacheable Tokens
Opus 4.7	4,096
Opus 4.6	4,096
Opus 4.5	4,096
Opus 4.1 / Opus 4	1,024
Sonnet 4.6 / 4.5 / 3.7	1,024
Haiku 4.5	4,096
Haiku 3.5	2,048
Haiku 3	2,048

The Technique

The API does not return an error when your content is below the minimum. It simply processes the request without caching. You must verify caching is active by checking the response usage fields.

import anthropic
client = anthropic.Anthropic()
def verify_caching(model: str, system_text: str) -> dict:
    """Test whether caching activates for a given prompt and model."""
    # First request -- should trigger cache write
    response = client.messages.create(
        model=model,
        max_tokens=50,
        system=[
            {
                "type": "text",
                "text": system_text,
                "cache_control": {"type": "ephemeral"}
            }
        ],
        messages=[{"role": "user", "content": "Hello."}]
    )
    usage = response.usage
    cache_write = getattr(usage, 'cache_creation_input_tokens', 0)
    cache_read = getattr(usage, 'cache_read_input_tokens', 0)
    uncached = usage.input_tokens
    status = "ACTIVE" if cache_write > 0 else "INACTIVE"
    return {
        "model": model,
        "cache_status": status,
        "cache_write_tokens": cache_write,
        "uncached_tokens": uncached,
        "prompt_tokens_estimated": len(system_text.split()) * 1.3
    }
# Test the same prompt on two models
prompt = open("system_prompt.txt").read()
result_opus = verify_caching("claude-opus-4-7-20250415", prompt)
result_sonnet = verify_caching("claude-sonnet-4-6-20250929", prompt)
print(f"Opus: {result_opus['cache_status']} "
      f"(~{result_opus['prompt_tokens_estimated']:.0f} tokens)")
print(f"Sonnet: {result_sonnet['cache_status']} "
      f"(~{result_sonnet['prompt_tokens_estimated']:.0f} tokens)")

If your prompt falls below the threshold, here are three options ranked by effectiveness:

Option 1: Add useful context to exceed the threshold. If your system prompt is 3,000 tokens on Opus, add 1,100+ tokens of relevant few-shot examples, domain terminology, or formatting instructions. This improves output quality while enabling caching.

Option 2: Switch models. Move from Opus 4.7 (4,096 minimum) to Sonnet 4.6 (1,024 minimum). For many workloads, Sonnet produces comparable quality at lower base pricing ($3.00 vs $5.00/MTok) and activates caching on smaller prompts.

Option 3: Combine multiple cacheable blocks. If you have a 2,000-token system prompt and a 3,000-token few-shot block, combine them under a single breakpoint. The 5,000-token combined block exceeds the 4,096 threshold for Opus.

# Count tokens in your system prompt using the Anthropic tokenizer
python3 -c "
import anthropic
client = anthropic.Anthropic()
text = open('system_prompt.txt').read()
count = client.count_tokens(text)
thresholds = {'opus-4.7': 4096, 'sonnet-4.6': 1024, 'haiku-4.5': 4096}
print(f'Token count: {count}')
for model, min_t in thresholds.items():
    status = 'OK' if count >= min_t else 'BELOW MINIMUM'
    print(f'  {model}: {status} (need {min_t})')
"

The Tradeoffs

Working around minimum token requirements involves compromises:

Padding prompts risks quality: Adding filler content to reach 4,096 tokens can confuse the model. Only add contextually relevant material.
Downgrading models risks accuracy: Sonnet 4.6 has a lower threshold but may produce lower quality output on complex reasoning tasks. Test quality before switching.
Combining blocks reduces flexibility: Merging two content types under one breakpoint means both get invalidated when either changes, increasing write costs.
Token counting varies: The new tokenizer in Opus 4.7 may use up to 35% more tokens for the same text compared to older models. Content that falls below the threshold on Sonnet 4.5 might exceed it on Opus 4.7.

Implementation Checklist

Count tokens in every cacheable content block using the Anthropic tokenizer
Compare counts against the minimum threshold for your target model
If below threshold: add useful context, switch models, or combine blocks
Deploy with monitoring on cache_creation_input_tokens
If cache_creation_input_tokens is 0 on the first request, caching is not activating
Re-verify after any prompt changes that might reduce token count

Measuring Impact

After resolving threshold issues:

First-request verification: cache_creation_input_tokens > 0 confirms caching is active
Daily cache write count: Should match the number of unique prompts, not the number of total requests
Cost comparison: If you padded your prompt, compare the slight increase in per-token cost against the 90% cache read savings
Run a weekly audit to catch any prompt updates that push content below the minimum threshold

Which model? → Take the 5-question quiz in our Model Selector.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Claude Cache Minimum Token Requirements (2026)

The Setup

The Math

The Technique

The Tradeoffs

Implementation Checklist

Measuring Impact

See Also

About the Author

The Setup

The Math

The Technique

The Tradeoffs

Implementation Checklist

Measuring Impact

Related Guides

See Also

About the Author

Related Guides