Before pruning (200K context, Opus 4.7): Per interaction: 200K * $5.00/MTok = $1.00 100 interactions/day: $100/day -> $3,000/month After Strategy 1 — Noise removal (removes 30%): Context: 140K tokens -> $0.70/interaction Monthly: $2,100 (30% savings) After Strategy 2 — Code de...

Smart Context Pruning for Claude API (2026)

Last updated: April 19, 2026

A Claude Opus 4.7 session with 200K tokens of accumulated context costs $1.00 per interaction. Pruning that context to 80K tokens — by removing diagnostic noise, deduplicating code blocks, and summarizing old turns — drops the cost to $0.40 per interaction. At 100 interactions per day, that saves $1,800/month.

The Setup

Context pruning removes tokens that do not contribute to Claude’s ability to generate a useful response. In a typical Claude Code session, 40-60% of accumulated context consists of superseded code versions, resolved error traces, build logs, and redundant conversation turns.

Pruning differs from truncation. Truncation blindly removes content from the beginning or end. Pruning selectively removes low-value content while preserving high-value content regardless of its position in the conversation.

This guide covers four pruning strategies ordered by impact, with code implementations and cost calculations for each.

The Math

Before pruning (200K context, Opus 4.7):

Per interaction: 200K * $5.00/MTok = $1.00
100 interactions/day: $100/day -> $3,000/month

After Strategy 1 — Noise removal (removes 30%):

Context: 140K tokens -> $0.70/interaction
Monthly: $2,100 (30% savings)

After Strategy 2 — Code deduplication (removes additional 15%):

Context: 110K tokens -> $0.55/interaction
Monthly: $1,650 (45% savings)

After Strategy 3 — History summarization (removes additional 15%):

Context: 80K tokens -> $0.40/interaction
Monthly: $1,200 (60% savings)

After Strategy 4 — Output truncation (removes additional 10%):

Context: 60K tokens -> $0.30/interaction
Monthly: $900 (70% savings)

Total savings: $2,100/month (70%)

Scale to 500 interactions/day: savings of $10,500/month.

The Technique

Strategy 1: Remove Diagnostic Noise (Highest Impact)

Build output, deprecation warnings, and verbose error traces accumulate quickly and provide no value once the issue is resolved.

import re
from typing import list
NOISE_PATTERNS = [
    r"npm WARN .*",
    r"warning:.*deprecated.*",
    r"^\s*at\s+\w+.*\(.*:\d+:\d+\)$",  # Stack trace frames
    r"^\s*at\s+Object\.<anonymous>.*",
    r"Compiling\s+\w+.*",
    r"Building\s+.*",
    r"^\s*\d+\s+(verbose|silly|http)\s+.*",  # npm verbose logs
    r"DEPRECATION WARNING.*",
    r"^\s*~+$",  # Underline decorations
]
compiled_patterns = [re.compile(p) for p in NOISE_PATTERNS]
def remove_noise(content: str) -> tuple:
    """Remove diagnostic noise lines from content."""
    lines = content.split("\n")
    clean_lines = []
    removed_count = 0
    for line in lines:
        is_noise = any(p.search(line) for p in compiled_patterns)
        if is_noise:
            removed_count += 1
        else:
            clean_lines.append(line)
    if removed_count > 0:
        clean_lines.append(f"[{removed_count} diagnostic lines removed]")
    return "\n".join(clean_lines), removed_count
def prune_messages_noise(messages: list) -> list:
    """Apply noise removal to all messages."""
    total_removed = 0
    cleaned = []
    for msg in messages:
        clean_content, removed = remove_noise(msg["content"])
        total_removed += removed
        cleaned.append({**msg, "content": clean_content})
    return cleaned, total_removed

Strategy 2: Deduplicate Code Blocks

When Claude iterates on code, the context accumulates multiple versions of the same file. Only the latest version matters.

import hashlib
def deduplicate_code_blocks(messages: list) -> list:
    """Keep only the latest version of each code block."""
    # First pass: find all code blocks and their positions
    code_versions = {}  # filename -> list of (msg_index, content_hash, content)

    for i, msg in enumerate(messages):
        content = msg["content"]
        # Extract file references and code blocks
        blocks = re.findall(r"(?:File:\s*`?([^`\n]+)`?\n)?```[\w]*\n(.*?)```", content, re.DOTALL)
        for filename, code in blocks:
            if not filename:
                filename = f"unnamed_block_{hashlib.md5(code[:100].encode()).hexdigest()[:6]}"
            code_hash = hashlib.md5(code.strip().encode()).hexdigest()
            if filename not in code_versions:
                code_versions[filename] = []
            code_versions[filename].append({"msg_idx": i, "hash": code_hash})
    # Identify outdated versions (all except the last occurrence of each file)
    outdated_hashes = set()
    for filename, versions in code_versions.items():
        if len(versions) > 1:
            for v in versions[:-1]:  # All except last
                outdated_hashes.add((v["msg_idx"], v["hash"]))
    # Second pass: replace outdated code blocks with references
    cleaned = []
    tokens_saved = 0
    for i, msg in enumerate(messages):
        content = msg["content"]
        blocks = re.findall(r"```[\w]*\n(.*?)```", content, re.DOTALL)
        for code in blocks:
            code_hash = hashlib.md5(code.strip().encode()).hexdigest()
            if (i, code_hash) in outdated_hashes:
                placeholder = "[superseded code version removed]"
                content = content.replace(f"```\n{code}```", placeholder, 1)
                content = content.replace(f"```python\n{code}```", placeholder, 1)
                tokens_saved += len(code) // 4
        cleaned.append({**msg, "content": content})
    return cleaned, tokens_saved

Strategy 3: Summarize Old Conversation History

import anthropic
client = anthropic.Anthropic()
def summarize_old_turns(
    messages: list,
    keep_recent: int = 8,
    model: str = "claude-haiku-4-5-20251001",
) -> list:
    """Replace old conversation turns with a concise summary."""
    if len(messages) <= keep_recent:
        return messages
    old_messages = messages[:-keep_recent]
    recent_messages = messages[-keep_recent:]
    # Build summary of old messages
    old_text = "\n".join(
        f"{m['role']}: {m['content'][:500]}" for m in old_messages
    )
    summary_resp = client.messages.create(
        model=model,  # Use Haiku for cheap summarization
        max_tokens=800,
        system="Summarize this conversation concisely. Preserve: decisions made, files changed, current task state, unresolved items. Max 250 words.",
        messages=[{"role": "user", "content": old_text}],
    )
    summary = summary_resp.content[0].text
    return [
        {"role": "user", "content": f"[Summary of previous {len(old_messages)} messages: {summary}]"},
        {"role": "assistant", "content": "Understood. Continuing with this context."},
    ] + recent_messages

Strategy 4: Truncate Long Tool Outputs

def truncate_tool_outputs(messages: list, max_output_lines: int = 40) -> list:
    """Truncate verbose tool outputs while keeping key information."""
    cleaned = []
    for msg in messages:
        content = msg["content"]
        lines = content.split("\n")
        if len(lines) > max_output_lines:
            # Keep first 15 and last 15 lines, add truncation marker
            head = lines[:15]
            tail = lines[-15:]
            truncated = len(lines) - 30
            content = "\n".join(head + [f"\n[...{truncated} lines truncated...]\n"] + tail)
        cleaned.append({**msg, "content": content})
    return cleaned

Complete Pruning Pipeline

def full_prune_pipeline(messages: list) -> dict:
    """Apply all pruning strategies in order of impact."""
    original_tokens = sum(len(m["content"]) // 4 for m in messages)
    # Strategy 1: Noise removal
    messages, noise_removed = prune_messages_noise(messages)
    # Strategy 2: Code deduplication
    messages, code_tokens_saved = deduplicate_code_blocks(messages)
    # Strategy 3: Truncate outputs
    messages = truncate_tool_outputs(messages)
    # Strategy 4: Summarize old turns
    messages = summarize_old_turns(messages)
    final_tokens = sum(len(m["content"]) // 4 for m in messages)
    reduction = (1 - final_tokens / max(original_tokens, 1)) * 100
    return {
        "messages": messages,
        "original_tokens": original_tokens,
        "final_tokens": final_tokens,
        "reduction": f"{reduction:.1f}%",
        "opus_savings_per_request": f"${(original_tokens - final_tokens) * 5 / 1_000_000:.4f}",
    }

The Tradeoffs

Pruning is irreversible within a session. If Claude later needs information that was pruned, you must re-provide it. Mitigate by pruning conservatively (start with noise removal only) and expanding pruning strategies gradually.

Summarization via Haiku adds cost ($0.001-$0.01 per summary) and latency (0.5-2 seconds). This is negligible compared to the savings but worth tracking.

Aggressive code deduplication can remove intermediate versions that showed Claude’s reasoning. If debugging Claude’s approach, keep more versions.

Implementation Checklist

Implement noise removal first (zero risk, highest impact)
Add code deduplication for coding-heavy sessions
Implement output truncation at 40 lines
Add history summarization for sessions longer than 15 turns
Monitor context size before and after pruning
Track quality impact on response usefulness
Tune thresholds based on your specific workflow

Measuring Impact

Measure average context size per interaction with and without pruning. Target a 50-70% reduction. Calculate monthly cost savings using the formula: (original_tokens - pruned_tokens) * interactions * rate / 1,000,000 * 30. Track the re-explanation rate (how often you need to re-provide pruned information) — keep this under 5%.

Which model? → Take the 5-question quiz in our Model Selector.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Why Is Claude Code Expensive — context as the primary cost driver
Claude Code Context Window Management Guide — complementary management strategies
Why Does Anthropic Limit Claude Code Context Window — understanding context design decisions

Smart Context Pruning for Claude API (2026)

The Setup

The Math

The Technique

Strategy 1: Remove Diagnostic Noise (Highest Impact)

Strategy 2: Deduplicate Code Blocks

Strategy 3: Summarize Old Conversation History

Strategy 4: Truncate Long Tool Outputs

Complete Pruning Pipeline

The Tradeoffs

Implementation Checklist

Measuring Impact

See Also

About the Author

The Setup

The Math

The Technique

Strategy 1: Remove Diagnostic Noise (Highest Impact)

Strategy 2: Deduplicate Code Blocks

Strategy 3: Summarize Old Conversation History

Strategy 4: Truncate Long Tool Outputs

Complete Pruning Pipeline

The Tradeoffs

Implementation Checklist

Measuring Impact

Related Guides

See Also

About the Author

Related Guides