Reducing Claude Code Token Consumption: 8 Techniques (2026)
Claude Code’s token usage is not fixed – it is a function of how you configure your project and write your prompts. The difference between an optimized and unoptimized setup is 40-60% in monthly spend. These eight techniques are ordered by impact, with measured before/after token counts from real projects. Use the Token Estimator to model the savings for your specific codebase.
Technique 1: Configure .claudeignore
Impact: -25 to -40% input tokens
Claude Code reads files to understand your codebase. Without a .claudeignore, it may read build artifacts, dependency directories, and generated files that add thousands of tokens without useful context.
# Create .claudeignore at project root
cat > .claudeignore << 'EOF'
node_modules/
dist/
build/
.next/
coverage/
*.lock
*.min.js
*.map
__pycache__/
*.pyc
.venv/
target/
vendor/
EOF
Before: Claude reads node_modules/lodash/index.js during a search – 3,200 tokens wasted.
After: Excluded entirely. Zero tokens spent on dependencies.
Technique 2: Use /compact Strategically
Impact: -30 to -50% on long sessions
Every message resends the full conversation history. After 10 exchanges, context accumulation dominates your token bill. Run /compact after completing a sub-task.
# Session flow:
# Messages 1-5: Implement user auth (accumulated: 45,000 tokens)
/compact
# Context drops to ~8,000 tokens (summary)
# Messages 6-10: Add rate limiting (starts from 8,000 instead of 45,000)
Before (15-message session): 180,000 total tokens (context snowball). After (/compact at message 7): 105,000 total tokens. 42% reduction.
Technique 3: Write Focused Prompts
Impact: -20 to -35% per task
Vague prompts cause Claude Code to search broadly. Specific prompts let it go directly to the relevant code.
# Unfocused (triggers broad codebase scan):
"Fix the authentication bug"
# Claude reads 12 files, 28,000 input tokens
# Focused (targets exact location):
"Fix the JWT expiration check in src/auth/verify.ts -- the
token.exp comparison on line 47 doesn't account for clock skew"
# Claude reads 2 files, 6,000 input tokens
Point Claude Code to specific files, line numbers, and error messages. Every file it does not need to read is 1,000-5,000 tokens saved.
Technique 4: Optimize Your CLAUDE.md
Impact: -10 to -20% across all sessions
Your CLAUDE.md file is loaded into every session’s context. A bloated CLAUDE.md with lengthy explanations wastes tokens on every single interaction.
# BAD: 2,000 tokens of CLAUDE.md (loaded every message)
## Project Overview
This is a comprehensive e-commerce platform built with...
[500 words of background]
## Development Philosophy
We follow clean architecture principles because...
[300 words of philosophy]
# GOOD: 400 tokens of CLAUDE.md (concise, actionable)
## Stack
- Next.js 15 App Router, TypeScript strict, Tailwind
- Supabase (auth, DB, edge functions)
- pnpm, Vitest, Playwright
## Rules
- All functions < 60 lines. 2+ assertions per function.
- Never modify migrations/. Always create new ones.
- Run `pnpm test` before suggesting changes complete.
Use the CLAUDE.md Generator to create an optimized configuration that includes only actionable rules.
Technique 5: Choose the Right Model
Impact: -15 to -30% on cost (varies by task)
Opus produces 30% more output tokens than Sonnet for the same task. For routine tasks (formatting, simple fixes, documentation), Haiku or Sonnet deliver equivalent results at a fraction of the cost.
# Use Sonnet for daily development (default)
claude --model sonnet
# Use Opus only for complex architecture decisions
claude --model opus
# Use Haiku for batch operations and simple tasks
claude --model haiku
Match model to task complexity. Use the Model Selector for guidance on which model fits each task type.
Technique 6: Scope Sessions to Single Tasks
Impact: -20 to -30% vs multi-task sessions
Starting a new session for each task resets context to zero. Multi-task sessions accumulate context from completed tasks that is irrelevant to the current one.
# Bad: One long session for everything
claude # Task 1: fix auth bug (context grows to 40K)
# Task 2: add logging (context at 80K, includes auth code)
# Task 3: update docs (context at 120K, includes auth + logging)
# Good: Fresh session per task
claude # Task 1: fix auth bug (context: 15K)
claude # Task 2: add logging (context: 12K)
claude # Task 3: update docs (context: 8K)
# Total: 35K vs 120K
Technique 7: Limit File Reads with Specific Paths
Impact: -10 to -20% per task
When Claude Code searches for relevant files, it may read 10+ files. You can preempt this by specifying exactly which files it should look at.
# Let Claude explore (reads 8 files):
"Add error handling to the payment flow"
# Constrain to specific files (reads 2 files):
"Add try/catch with specific error types to
src/payments/charge.ts and src/payments/refund.ts"
Technique 8: Prune MCP Tool Definitions
Impact: -5 to -15% input tokens
Each MCP server adds tool definitions to every request. Five MCP servers with 10 tools each can add 8,000-12,000 tokens of tool schemas to every single message.
// .claude/settings.json -- only enable tools you actively use
{
"mcpServers": {
"filesystem": { "command": "..." },
"git": { "command": "..." }
// Removed: database, docker, kubernetes MCP servers
// that were rarely used but added 6,000 tokens/request
}
}
Review your MCP configuration in Claude Code Configuration and remove servers you do not use daily.
Combined Impact
Applying all eight techniques to a typical development workflow:
| Metric | Before | After | Savings |
|---|---|---|---|
| Avg session tokens | 65,000 | 28,000 | 57% |
| Monthly tokens (solo dev) | 13M | 5.6M | 57% |
| Monthly cost (Sonnet) | $78 | $34 | $44/mo |
Try It Yourself
Run the Token Estimator to calculate your current baseline and projected savings. Input your codebase size, session frequency, and current configuration. The estimator shows which techniques deliver the largest reduction for your specific setup.
Frequently Asked Questions
Which technique should I implement first?
Start with .claudeignore (Technique 1) -- it takes 30 seconds and delivers the largest single improvement. Then add /compact to your workflow (Technique 2). These two changes alone typically cut usage by 35-45%.Can I apply all techniques simultaneously?
Yes, and they compound. However, the total savings will be less than the sum of individual percentages because some techniques overlap. A .claudeignore reduces the files Claude reads, which also reduces the benefit of focused prompts since there are fewer irrelevant files to explore. Realistic combined savings is 40-60%.Do these techniques reduce output quality?
No. These techniques reduce wasted tokens -- context that does not contribute to the response. Claude Code produces equivalent or better output when given focused context rather than a sprawling, unfocused input. The Token Estimator confirms quality is maintained across optimization levels.How often should I run /compact?
Run /compact after every completed sub-task, or when you notice the session has exceeded 10 back-and-forth exchanges. A good rule: if you are about to start a different type of work within the same session, compact first. See the Commands Reference for more details.Configure it → Build your MCP config with our MCP Config Generator.
Related Guides
- Token Estimator – Model savings for your specific project
- Claude Code Cost Calculator – See dollar impact of each technique
- CLAUDE.md Generator – Create an optimized, lean CLAUDE.md
- Commands Reference – Full /compact documentation and other commands
- Configuration Guide – .claudeignore and settings.json reference