When NOT to Use Claude Prompt Caching

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Prompt caching is not free. Every cache write costs 1.25x the standard input price on 5-minute TTL and 2.0x on 1-hour TTL. If your cache never gets read, you pay that premium with zero return. A system that writes cache entries without reading them spends 25% more than one with no caching at all.

The Setup

You run a content generation service where each request has a unique system prompt tailored to the specific brand, voice, and product being written about. No two requests share the same prompt. You enabled prompt caching across all endpoints expecting savings.

Instead of saving money, you added 25% to every input cost. On Opus 4.7, your effective input price went from $5.00/MTok to $6.25/MTok – the cache write premium – because every request triggered a write and none triggered a read. At 1,000 requests/day with 10,000-token prompts, that is an extra $12.50/day or $375/month in pure waste.

The Math

Scenario: Unique prompts, no cache reuse, Opus 4.7, 10K tokens, 1,000 requests/day:

Without caching:

With caching enabled (100% write, 0% read):

Extra cost: $12.50/day ($375/month) for zero benefit

The break-even threshold requires at least 1 read per write for 5-minute cache (2 total requests per unique prompt). If your hit rate is below 50% (more writes than reads), you are losing money.

1-hour cache on the same workload is worse:

The Technique

Before enabling caching, audit your prompt reuse patterns. Here is a diagnostic script that analyzes your request logs:

import json
import hashlib
from collections import Counter

def analyze_cache_potential(log_file: str) -> dict:
    """Analyze API request logs to determine caching viability."""

    prompt_hashes = []
    with open(log_file) as f:
        for line in f:
            req = json.loads(line)
            system_text = req.get("system", "")
            if isinstance(system_text, list):
                system_text = " ".join(
                    b["text"] for b in system_text if b.get("type") == "text"
                )
            prompt_hash = hashlib.sha256(system_text.encode()).hexdigest()[:16]
            prompt_hashes.append(prompt_hash)

    total_requests = len(prompt_hashes)
    unique_prompts = len(set(prompt_hashes))
    frequency = Counter(prompt_hashes)

    # Prompts used more than once (cacheable)
    reusable = sum(1 for count in frequency.values() if count > 1)
    reusable_requests = sum(
        count for count in frequency.values() if count > 1
    )

    reuse_ratio = reusable_requests / total_requests if total_requests > 0 else 0

    return {
        "total_requests": total_requests,
        "unique_prompts": unique_prompts,
        "reusable_prompts": reusable,
        "reuse_ratio": f"{reuse_ratio:.1%}",
        "recommendation": (
            "ENABLE caching" if reuse_ratio > 0.5
            else "DO NOT cache" if reuse_ratio < 0.2
            else "SELECTIVE caching (cache only reused prompts)"
        )
    }

result = analyze_cache_potential("api_requests.jsonl")
print(json.dumps(result, indent=2))

Five scenarios where caching costs more than it saves:

1. Unique prompts per request. Personalized content generation, dynamic RAG prompts with different retrieved chunks, or A/B testing multiple prompt variants. Every request triggers a write with no reads.

2. Low-frequency endpoints. An internal tool used 3 times per day with 15-minute gaps between requests. The 5-minute cache expires before the next request. Each request is a write.

3. Rapidly iterating prompts. During prompt engineering, you change the system prompt every few requests. Each version gets cached and immediately abandoned. Switch caching off during development.

4. Small prompts below the minimum threshold. A 500-token system prompt on Opus 4.7 (minimum 4,096) silently skips caching. The cache_control annotation is ignored, but you pay no penalty in this case. The risk is false confidence – you think caching is active when it is not.

5. Output-dominated costs. If your workload generates 10x more output tokens than input tokens, caching the input saves 90% of a small fraction of your bill. A request with 5K input and 50K output on Opus 4.7 costs $0.025 input + $1.25 output. Caching saves at most $0.0225 – 1.8% of the total.

The Tradeoffs

The decision to skip caching has its own costs:

Implementation Checklist

  1. Run the prompt reuse analysis script on one week of production request logs
  2. Calculate your prompt reuse ratio (target: above 50% for caching to be worthwhile)
  3. If reuse ratio is below 20%, do not enable caching
  4. If reuse ratio is 20-50%, enable caching only on endpoints with high-reuse prompts
  5. Disable caching on development and testing endpoints where prompts change frequently
  6. Re-evaluate every month as traffic patterns shift

Measuring Impact

If you suspect caching is costing you money: