Reduce Claude Code Token Consumption by 60%
The average Claude Code developer processes roughly 2 million tokens per day across sessions, translating to about $6/day at API-equivalent rates. With five specific techniques, you can cut that to 800,000 tokens – $2.40/day – without losing productivity. The techniques are: targeted file reads (saves 35%), strategic /compact usage (saves 15%), session splitting (saves 5%), lean CLAUDE.md (saves 3%), and filtered command output (saves 2%). Combined, they deliver roughly 60% reduction in token consumption.
The Setup
Token consumption in Claude Code comes from five sources. File reads are the largest (35-45% of total tokens), followed by command outputs (15-25%), model responses (15-20%), tool overhead (5-10%), and conversation history (10-15%). Each source has a specific reduction technique. Most developers focus on the model’s response verbosity (asking for shorter answers), but that’s actually the least impactful lever. The biggest wins come from controlling what goes into the context – file reads and command outputs – because input tokens vastly outnumber output tokens in coding sessions.
The Math
Baseline daily consumption (unoptimized developer):
| Source | Tokens/Session | Sessions/Day | Daily Total |
|---|---|---|---|
| File reads | 40,000 | 8 | 320,000 |
| Command outputs | 20,000 | 8 | 160,000 |
| Model responses | 15,000 | 8 | 120,000 |
| Tool overhead | 8,000 | 8 | 64,000 |
| Conversation history (growing) | 30,000 avg | 8 | 240,000 |
| System + CLAUDE.md | 2,000 | 8 | 16,000 |
| Total | 115,000 | 8 | 920,000 |
Actual usage is higher because context re-sends: ~2M tokens/day At Opus $5.00/MTok: $10.00/day ($220/month)
After optimization:
| Technique | Reduction | Tokens Saved/Day |
|---|---|---|
| Targeted file reads | 50% of file tokens | 160,000 |
| /compact usage | 40% of history tokens | 96,000 |
| Session splitting | 20% of re-sent context | 120,000 |
| Lean CLAUDE.md | 60% of system tokens | 5,760 |
| Filtered command output | 40% of command tokens | 64,000 |
| Total saved | 445,760 |
Optimized daily total: ~800,000 tokens At Opus $5.00/MTok: $4.00/day ($88/month) Savings: $132/month (60%)
The Technique
Apply all five techniques together for compound savings.
Technique 1: Targeted file reads (biggest impact)
# BEFORE: Claude reads 20 files to understand the codebase
# 20 files x 2,000 tokens = 40,000 tokens per session
# AFTER: Tell Claude exactly which files matter
# In your prompt: "The bug is in src/auth/login.ts around line 85.
# The test is in tests/auth/login.test.ts. Only read these two files."
# 2 files x 2,000 tokens = 4,000 tokens per session
# Savings: 36,000 tokens per session
Technique 2: Strategic /compact usage
# Run /compact at these checkpoints:
# 1. After initial investigation is complete
# 2. After each bug fix or feature implementation
# 3. Before starting a new subtask in the same session
# 4. Whenever you feel context is "heavy" (>100K tokens)
# Expected reduction: 50-70% of accumulated context
# A session at 150K drops to ~50K after /compact
Technique 3: Session splitting
# BEFORE: One session, 3 tasks
# Task 1: 0 -> 80K context
# Task 2: 80K -> 160K context (task 1 history still present)
# Task 3: 160K -> 240K context (tasks 1+2 history)
# Average context per interaction: ~120K
# AFTER: Three sessions
# Session 1: 0 -> 80K context (task 1 only)
# Session 2: 0 -> 80K context (task 2 only)
# Session 3: 0 -> 80K context (task 3 only)
# Average context per interaction: ~40K (3x lower)
Technique 4: Lean CLAUDE.md
# CLAUDE.md - Minimal effective version (under 150 tokens)
## Stack
TypeScript, Next.js 14, Prisma, PostgreSQL
## Structure
src/app/ (routes), src/lib/ (utils), src/components/ (UI)
## Commands
pnpm dev | pnpm test | pnpm lint | pnpm build
## Rules
- Functional components, server-first
- Zod validation on all inputs
- Tests required for new functions
Technique 5: Filtered command output
# BEFORE: Full test output (2,000 tokens)
npm test
# AFTER: Filtered test output (200 tokens)
npm test 2>&1 | grep -E "(PASS|FAIL|Error|✓|✗)" | tail -20
# BEFORE: Full build output (5,000 tokens)
npm run build
# AFTER: Errors only (100-500 tokens)
npm run build 2>&1 | grep -i error | tail -10
# BEFORE: Full git diff (3,000 tokens)
git diff
# AFTER: Summary only (200 tokens)
git diff --stat
Put it all together:
# Token budget calculator for a coding session
def plan_session(tasks: list[dict]) -> dict:
"""Plan a coding session with token budgets."""
total_budget = 0
plan = []
for task in tasks:
files = task.get("files", 2)
avg_file_tokens = task.get("avg_file_tokens", 2000)
interactions = task.get("interactions", 8)
file_tokens = files * avg_file_tokens
command_tokens = interactions * 300 # filtered outputs
response_tokens = interactions * 800
overhead_tokens = interactions * 600 # tool defs
total = file_tokens + command_tokens + response_tokens + overhead_tokens
plan.append({
"task": task["name"],
"estimated_tokens": total,
"compact_after": total > 80000,
})
total_budget += total
return {
"tasks": plan,
"total_estimated_tokens": total_budget,
"estimated_cost_opus": f"${total_budget * 5 / 1_000_000:.2f}",
"sessions_recommended": max(1, total_budget // 80000),
}
result = plan_session([
{"name": "Fix auth bug", "files": 3, "interactions": 10},
{"name": "Add email validation", "files": 2, "interactions": 8},
{"name": "Write unit tests", "files": 4, "interactions": 12},
])
for task in result["tasks"]:
print(f" {task['task']}: ~{task['estimated_tokens']:,} tokens"
f"{' (compact after)' if task['compact_after'] else ''}")
print(f"Total: ~{result['total_estimated_tokens']:,} tokens")
print(f"Sessions recommended: {result['sessions_recommended']}")
The Tradeoffs
Aggressive optimization can slow you down if it requires constant mental overhead about token budgets. Find a sustainable routine: targeted file reads and filtered outputs are easy habits that pay off immediately. Session splitting is natural for task-based workflows but awkward for exploratory coding. /compact is free money with minimal downside – just remember it’s lossy and may discard details you need later. The goal is 80/20: adopt the two or three highest-impact techniques that fit your workflow, not all five simultaneously.
Implementation Checklist
- Start with targeted file reads and filtered outputs (largest impact, easiest to adopt)
- Add /compact to your workflow at natural breakpoints
- Split sessions by task once the habit is established
- Trim CLAUDE.md to under 200 tokens
- Track daily token consumption for one week before and one week after
- Adopt techniques incrementally – don’t change everything at once
- Set a daily token budget and review adherence weekly
Measuring Impact
Measure your before/after token consumption by tracking session length and interaction count. On subscription plans, the impact shows as fewer rate limit encounters and more completed tasks per day. On API usage, calculate daily spend directly. The average developer on Claude Code spends $6/day at API rates (90% spend under $12/day). After applying these five techniques, target $2.40-$3.60/day. Track weekly averages rather than daily to smooth out variation from different task complexities.