Reduce Claude Code API Costs by 50%
The Problem
Your Claude Code API bills are higher than expected. You are spending $20-30 per active day when the average enterprise cost is around $13 per developer per active day, and 90% of users stay under $30.
Quick Fix
Switch to Sonnet for day-to-day work and use /clear between tasks:
/model sonnet
/clear
What’s Happening
Claude Code charges by API token consumption. Token costs scale with context size: the more context Claude processes, the more tokens you pay for. Every message you send includes the full conversation context, so a bloated session means every subsequent API call is expensive.
The three biggest cost drivers are model selection (Opus costs roughly 5x more than Sonnet per token), context size (stale conversations carry dead weight), and unnecessary file reads (each file read adds tokens that persist until compaction).
Step-by-Step Fix
Step 1: Choose the right model
Sonnet handles most coding tasks well at a fraction of Opus cost:
| Model | Input cost | Output cost |
|---|---|---|
| Claude Opus 4.6 | $5/MTok | $25/MTok |
| Claude Sonnet 4.6 | $3/MTok | $15/MTok |
| Claude Haiku 4.5 | $1/MTok | $5/MTok |
Switch with /model sonnet. Reserve Opus for complex architectural decisions. Set a default:
{
"env": {
"ANTHROPIC_MODEL": "claude-sonnet-4-6"
}
}
Step 2: Use /cost to track spend
Monitor your session cost:
/cost
This shows total tokens, API duration, and dollar cost for the current session. Watch for sessions that exceed $1-2 and investigate why.
Step 3: Clear between tasks
The single highest-impact habit. Stale context from a previous task wastes tokens on every subsequent message:
/rename auth-feature
/clear
Step 4: Use subagents for exploration
When Claude needs to search through many files, delegate to a subagent. The search results stay in the subagent’s context, not yours:
Use the Explore agent to find all API endpoints that handle authentication
The Explore subagent runs on Haiku (cheapest model) with read-only access. For custom subagents, set the model explicitly:
---
model: haiku
---
Step 5: Reduce MCP server overhead
Disable MCP servers you are not using:
/mcp
Prefer CLI tools (gh, aws, gcloud) over MCP servers. CLI tools add zero context overhead.
Step 6: Keep CLAUDE.md lean
Every line of CLAUDE.md consumes tokens on every message. Target under 200 lines. Move detailed procedures into skills that load on demand.
Step 7: Configure custom compaction
Tell Claude what to keep during compaction:
/compact Keep code samples and test results, drop exploration output
Step 8: Use hooks to reduce context
A PreToolUse hook can filter large outputs before Claude sees them:
{
"hooks": {
"PreToolUse": [
{
"matcher": "Bash",
"hooks": [
{
"type": "command",
"command": "~/.claude/hooks/filter-test-output.sh"
}
]
}
]
}
}
Step 9: Use batch processing for bulk work
If you are processing many similar requests, use the Message Batches API at 50% cost:
message_batch = client.messages.batches.create(requests=batch_requests)
Batch pricing: Sonnet drops to $1.50/MTok input, $7.50/MTok output.
Step 10: Set workspace spend limits
For teams, set workspace spend limits in the Claude Console to prevent runaway costs. Each workspace can have its own limit.
Cost benchmarks
Enterprise averages for reference:
- Average: ~$13 per developer per active day
- 90th percentile: ~$30 per developer per active day
- Monthly average: $150-250 per developer
If you are significantly above these, the strategies above will bring you into range.
Prevention
Build these habits:
- Default to Sonnet, switch to Opus only when needed
/clearbetween every distinct task/costcheck before and after complex operations- Subagents for any exploratory work
- Lean CLAUDE.md, skills for details
Level Up Your Claude Code Workflow
The developers who get the most out of Claude Code aren’t just fixing errors — they’re running multi-agent pipelines, using battle-tested CLAUDE.md templates, and shipping with production-grade operating principles.
Get Claude Code Mastery — included in Zovo Lifetime →
16 CLAUDE.md templates · 80+ prompts · orchestration configs · workflow playbooks. $99 once, free forever.