Why Large Context Makes Claude Code Expensive

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

A Claude Code session on Opus 4.7 starts at roughly 5K tokens and grows to 150K+ tokens within 30 interactions. Each interaction at 150K context costs $0.75 in input tokens alone. The average developer spends $6/day on Claude Code — and most of that goes to re-sending bloated context rather than generating new output.

The Setup

Claude Code maintains a running conversation history that includes every user message, assistant response, tool call, tool result, code edit, terminal output, and error trace from the entire session. Every new interaction sends this entire history back to the API as context.

The economics are straightforward: if your context is 150K tokens and input costs $5.00/MTok on Opus 4.7, each interaction costs $0.75 just to re-read the conversation history. The actual new content you are asking about might be 500 tokens — but you pay for 150K.

This guide breaks down exactly where Claude Code context tokens accumulate and how to manage them.

The Math

Context growth in a typical 30-interaction Claude Code session:

Interaction Est. Context Opus Input Cost Cumulative Cost
1 5K $0.025 $0.025
5 25K $0.125 $0.375
10 50K $0.250 $1.375
15 75K $0.375 $3.125
20 100K $0.500 $5.625
25 125K $0.625 $8.875
30 150K $0.750 $12.875

Total session input cost: $12.88 on Opus 4.7

For a developer running 3 sessions/day: $38.63/day -> $1,158.90/month in input tokens alone

Add output tokens (average 2K per interaction at $25/MTok):

With context management (compact at interaction 15):

Savings: $607.50/month (47%)

The Technique

Understanding Where Tokens Accumulate

def analyze_claude_code_context_growth(session_log: list) -> dict:
    """Analyze token growth patterns in a Claude Code session."""
    components = {
        "tool_calls": 0,
        "tool_results": 0,
        "code_output": 0,
        "error_traces": 0,
        "assistant_text": 0,
        "user_messages": 0,
    }

    for entry in session_log:
        content = entry.get("content", "")
        char_count = len(content)
        token_est = char_count // 4

        role = entry.get("role", "")

        if entry.get("type") == "tool_use":
            components["tool_calls"] += token_est
        elif entry.get("type") == "tool_result":
            # Tool results are the biggest token consumers
            if "Error" in content or "Traceback" in content:
                components["error_traces"] += token_est
            elif "```" in content:
                components["code_output"] += token_est
            else:
                components["tool_results"] += token_est
        elif role == "assistant":
            components["assistant_text"] += token_est
        elif role == "user":
            components["user_messages"] += token_est

    total = sum(components.values())
    percentages = {k: f"{v/max(total,1)*100:.1f}%" for k, v in components.items()}

    return {
        "total_tokens": total,
        "breakdown": components,
        "percentages": percentages,
        "top_consumer": max(components, key=components.get),
    }

The Biggest Token Consumers in Claude Code

Based on production usage analysis:

  1. Tool results (40-50%) — Terminal output, file contents read by Claude, build logs
  2. Code blocks (20-30%) — Full file contents in assistant responses
  3. Error traces (10-15%) — Stack traces, build errors, test failures
  4. Assistant text (10-15%) — Explanations, analysis, suggestions
  5. User messages (5-10%) — Your actual requests
# Strategies to reduce each component:

# 1. Tool results: Truncate long outputs
def truncate_tool_result(output: str, max_lines: int = 50) -> str:
    """Keep first and last N lines, truncate middle."""
    lines = output.split("\n")
    if len(lines) <= max_lines:
        return output
    half = max_lines // 2
    return "\n".join(lines[:half] + [f"\n... ({len(lines) - max_lines} lines truncated) ...\n"] + lines[-half:])

# 2. Code blocks: Reference files instead of including full content
# In Claude Code, prefer "read file X" over pasting file content

# 3. Error traces: Use compact error messages
def compact_error(traceback: str) -> str:
    """Extract the essential error info from a full traceback."""
    lines = traceback.strip().split("\n")
    # Keep last line (the actual error) and the file/line references
    essential = [l for l in lines if "File " in l or "Error" in l or "Exception" in l]
    return "\n".join(essential[-5:])  # Last 5 relevant lines

Practical Session Management in Claude Code

# 1. Start sessions with clear, focused scope
# BAD: "Fix all the bugs in the project"
# GOOD: "Fix the authentication bug in auth.py line 42"

# 2. Use /compact at natural breakpoints
# After completing a task:
/compact

# 3. Start new sessions for unrelated tasks
# Don't reuse a session from bug fixing for a new feature

# 4. Use CLAUDE.md to persist critical context across compactions
# Claude Code reads this file automatically

Cost Monitoring Script

import json
from datetime import datetime, timedelta

def estimate_daily_claude_code_cost(
    sessions_per_day: int = 3,
    interactions_per_session: int = 30,
    avg_growth_per_interaction: int = 5000,  # tokens
    model_input_rate: float = 5.0,  # $/MTok
    model_output_rate: float = 25.0,
    avg_output_tokens: int = 2000,
    compact_interval: int = 0,  # 0 = no compaction
) -> dict:
    """Estimate daily Claude Code spending."""
    total_input_cost = 0
    total_output_cost = 0

    for _ in range(sessions_per_day):
        context = 0
        for i in range(interactions_per_session):
            context += avg_growth_per_interaction

            if compact_interval > 0 and (i + 1) % compact_interval == 0:
                context = int(context * 0.33)

            input_cost = context * model_input_rate / 1_000_000
            output_cost = avg_output_tokens * model_output_rate / 1_000_000
            total_input_cost += input_cost
            total_output_cost += output_cost

    return {
        "daily_input_cost": f"${total_input_cost:.2f}",
        "daily_output_cost": f"${total_output_cost:.2f}",
        "daily_total": f"${total_input_cost + total_output_cost:.2f}",
        "monthly_total": f"${(total_input_cost + total_output_cost) * 30:.2f}",
    }

# Without compaction
print("No compaction:", estimate_daily_claude_code_cost())
# With compaction every 15 interactions
print("Compact@15:", estimate_daily_claude_code_cost(compact_interval=15))
# With compaction every 10 interactions
print("Compact@10:", estimate_daily_claude_code_cost(compact_interval=10))

The Tradeoffs

The average developer spending $6/day on Claude Code is already on the cheaper end. Industry data shows 90% of Claude Code users spend under $12/day. For many developers, the productivity gains justify the cost without optimization.

Context management adds cognitive overhead. Remembering to compact, starting new sessions, and writing CLAUDE.md files takes time. If your time is worth more than the savings, focus only on the highest-impact optimizations.

Aggressive compaction can disrupt complex debugging workflows where Claude needs the full error history to find root causes. Keep full context during active debugging; compact after resolution.

Implementation Checklist

  1. Track your session lengths (interactions per session)
  2. Identify sessions that regularly exceed 100K context
  3. Add /compact to your workflow after completing each subtask
  4. Start new sessions for unrelated tasks
  5. Use CLAUDE.md for persistent context across sessions
  6. Monitor daily spend through your Anthropic dashboard

Measuring Impact

Track daily Claude Code spend before and after adopting context management habits. The industry benchmark is $6/day average. If you are above that, context management can bring you in line. Measure interactions per session and context size at session end — target keeping final context under 80K tokens through strategic compaction.