Reduce Claude Code Token Consumption by 60%

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

The average Claude Code developer processes roughly 2 million tokens per day across sessions, translating to about $6/day at API-equivalent rates. With five specific techniques, you can cut that to 800,000 tokens – $2.40/day – without losing productivity. The techniques are: targeted file reads (saves 35%), strategic /compact usage (saves 15%), session splitting (saves 5%), lean CLAUDE.md (saves 3%), and filtered command output (saves 2%). Combined, they deliver roughly 60% reduction in token consumption.

The Setup

Token consumption in Claude Code comes from five sources. File reads are the largest (35-45% of total tokens), followed by command outputs (15-25%), model responses (15-20%), tool overhead (5-10%), and conversation history (10-15%). Each source has a specific reduction technique. Most developers focus on the model’s response verbosity (asking for shorter answers), but that’s actually the least impactful lever. The biggest wins come from controlling what goes into the context – file reads and command outputs – because input tokens vastly outnumber output tokens in coding sessions.

The Math

Baseline daily consumption (unoptimized developer):

Source Tokens/Session Sessions/Day Daily Total
File reads 40,000 8 320,000
Command outputs 20,000 8 160,000
Model responses 15,000 8 120,000
Tool overhead 8,000 8 64,000
Conversation history (growing) 30,000 avg 8 240,000
System + CLAUDE.md 2,000 8 16,000
Total 115,000 8 920,000

Actual usage is higher because context re-sends: ~2M tokens/day At Opus $5.00/MTok: $10.00/day ($220/month)

After optimization:

Technique Reduction Tokens Saved/Day
Targeted file reads 50% of file tokens 160,000
/compact usage 40% of history tokens 96,000
Session splitting 20% of re-sent context 120,000
Lean CLAUDE.md 60% of system tokens 5,760
Filtered command output 40% of command tokens 64,000
Total saved   445,760

Optimized daily total: ~800,000 tokens At Opus $5.00/MTok: $4.00/day ($88/month) Savings: $132/month (60%)

The Technique

Apply all five techniques together for compound savings.

Technique 1: Targeted file reads (biggest impact)

# BEFORE: Claude reads 20 files to understand the codebase
# 20 files x 2,000 tokens = 40,000 tokens per session

# AFTER: Tell Claude exactly which files matter
# In your prompt: "The bug is in src/auth/login.ts around line 85.
# The test is in tests/auth/login.test.ts. Only read these two files."
# 2 files x 2,000 tokens = 4,000 tokens per session

# Savings: 36,000 tokens per session

Technique 2: Strategic /compact usage

# Run /compact at these checkpoints:
# 1. After initial investigation is complete
# 2. After each bug fix or feature implementation
# 3. Before starting a new subtask in the same session
# 4. Whenever you feel context is "heavy" (>100K tokens)

# Expected reduction: 50-70% of accumulated context
# A session at 150K drops to ~50K after /compact

Technique 3: Session splitting

# BEFORE: One session, 3 tasks
# Task 1: 0 -> 80K context
# Task 2: 80K -> 160K context (task 1 history still present)
# Task 3: 160K -> 240K context (tasks 1+2 history)
# Average context per interaction: ~120K

# AFTER: Three sessions
# Session 1: 0 -> 80K context (task 1 only)
# Session 2: 0 -> 80K context (task 2 only)
# Session 3: 0 -> 80K context (task 3 only)
# Average context per interaction: ~40K (3x lower)

Technique 4: Lean CLAUDE.md

# CLAUDE.md - Minimal effective version (under 150 tokens)

## Stack
TypeScript, Next.js 14, Prisma, PostgreSQL

## Structure
src/app/ (routes), src/lib/ (utils), src/components/ (UI)

## Commands
pnpm dev | pnpm test | pnpm lint | pnpm build

## Rules
- Functional components, server-first
- Zod validation on all inputs
- Tests required for new functions

Technique 5: Filtered command output

# BEFORE: Full test output (2,000 tokens)
npm test

# AFTER: Filtered test output (200 tokens)
npm test 2>&1 | grep -E "(PASS|FAIL|Error|✓|✗)" | tail -20

# BEFORE: Full build output (5,000 tokens)
npm run build

# AFTER: Errors only (100-500 tokens)
npm run build 2>&1 | grep -i error | tail -10

# BEFORE: Full git diff (3,000 tokens)
git diff

# AFTER: Summary only (200 tokens)
git diff --stat

Put it all together:

# Token budget calculator for a coding session
def plan_session(tasks: list[dict]) -> dict:
    """Plan a coding session with token budgets."""
    total_budget = 0
    plan = []

    for task in tasks:
        files = task.get("files", 2)
        avg_file_tokens = task.get("avg_file_tokens", 2000)
        interactions = task.get("interactions", 8)

        file_tokens = files * avg_file_tokens
        command_tokens = interactions * 300  # filtered outputs
        response_tokens = interactions * 800
        overhead_tokens = interactions * 600  # tool defs
        total = file_tokens + command_tokens + response_tokens + overhead_tokens

        plan.append({
            "task": task["name"],
            "estimated_tokens": total,
            "compact_after": total > 80000,
        })
        total_budget += total

    return {
        "tasks": plan,
        "total_estimated_tokens": total_budget,
        "estimated_cost_opus": f"${total_budget * 5 / 1_000_000:.2f}",
        "sessions_recommended": max(1, total_budget // 80000),
    }

result = plan_session([
    {"name": "Fix auth bug", "files": 3, "interactions": 10},
    {"name": "Add email validation", "files": 2, "interactions": 8},
    {"name": "Write unit tests", "files": 4, "interactions": 12},
])
for task in result["tasks"]:
    print(f"  {task['task']}: ~{task['estimated_tokens']:,} tokens"
          f"{' (compact after)' if task['compact_after'] else ''}")
print(f"Total: ~{result['total_estimated_tokens']:,} tokens")
print(f"Sessions recommended: {result['sessions_recommended']}")

The Tradeoffs

Aggressive optimization can slow you down if it requires constant mental overhead about token budgets. Find a sustainable routine: targeted file reads and filtered outputs are easy habits that pay off immediately. Session splitting is natural for task-based workflows but awkward for exploratory coding. /compact is free money with minimal downside – just remember it’s lossy and may discard details you need later. The goal is 80/20: adopt the two or three highest-impact techniques that fit your workflow, not all five simultaneously.

Implementation Checklist

Measuring Impact

Measure your before/after token consumption by tracking session length and interaction count. On subscription plans, the impact shows as fewer rate limit encounters and more completed tasks per day. On API usage, calculate daily spend directly. The average developer on Claude Code spends $6/day at API rates (90% spend under $12/day). After applying these five techniques, target $2.40-$3.60/day. Track weekly averages rather than daily to smooth out variation from different task complexities.