Reduce Claude Code API Costs by 50%

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

The Problem

Your Claude Code API bills are higher than expected. You are spending $20-30 per active day when the average enterprise cost is around $13 per developer per active day, and 90% of users stay under $30.

Quick Fix

Switch to Sonnet for day-to-day work and use /clear between tasks:

/model sonnet
/clear

What’s Happening

Claude Code charges by API token consumption. Token costs scale with context size: the more context Claude processes, the more tokens you pay for. Every message you send includes the full conversation context, so a bloated session means every subsequent API call is expensive.

The three biggest cost drivers are model selection (Opus costs roughly 5x more than Sonnet per token), context size (stale conversations carry dead weight), and unnecessary file reads (each file read adds tokens that persist until compaction).

Step-by-Step Fix

Step 1: Choose the right model

Sonnet handles most coding tasks well at a fraction of Opus cost:

Model Input cost Output cost
Claude Opus 4.6 $5/MTok $25/MTok
Claude Sonnet 4.6 $3/MTok $15/MTok
Claude Haiku 4.5 $1/MTok $5/MTok

Switch with /model sonnet. Reserve Opus for complex architectural decisions. Set a default:

{
  "env": {
    "ANTHROPIC_MODEL": "claude-sonnet-4-6"
  }
}

Step 2: Use /cost to track spend

Monitor your session cost:

/cost

This shows total tokens, API duration, and dollar cost for the current session. Watch for sessions that exceed $1-2 and investigate why.

Step 3: Clear between tasks

The single highest-impact habit. Stale context from a previous task wastes tokens on every subsequent message:

/rename auth-feature
/clear

Step 4: Use subagents for exploration

When Claude needs to search through many files, delegate to a subagent. The search results stay in the subagent’s context, not yours:

Use the Explore agent to find all API endpoints that handle authentication

The Explore subagent runs on Haiku (cheapest model) with read-only access. For custom subagents, set the model explicitly:

---
model: haiku
---

Step 5: Reduce MCP server overhead

Disable MCP servers you are not using:

/mcp

Prefer CLI tools (gh, aws, gcloud) over MCP servers. CLI tools add zero context overhead.

Step 6: Keep CLAUDE.md lean

Every line of CLAUDE.md consumes tokens on every message. Target under 200 lines. Move detailed procedures into skills that load on demand.

Step 7: Configure custom compaction

Tell Claude what to keep during compaction:

/compact Keep code samples and test results, drop exploration output

Step 8: Use hooks to reduce context

A PreToolUse hook can filter large outputs before Claude sees them:

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          {
            "type": "command",
            "command": "~/.claude/hooks/filter-test-output.sh"
          }
        ]
      }
    ]
  }
}

Step 9: Use batch processing for bulk work

If you are processing many similar requests, use the Message Batches API at 50% cost:

message_batch = client.messages.batches.create(requests=batch_requests)

Batch pricing: Sonnet drops to $1.50/MTok input, $7.50/MTok output.

Step 10: Set workspace spend limits

For teams, set workspace spend limits in the Claude Console to prevent runaway costs. Each workspace can have its own limit.

Cost benchmarks

Enterprise averages for reference:

If you are significantly above these, the strategies above will bring you into range.

Prevention

Build these habits:

  1. Default to Sonnet, switch to Opus only when needed
  2. /clear between every distinct task
  3. /cost check before and after complex operations
  4. Subagents for any exploratory work
  5. Lean CLAUDE.md, skills for details

Level Up Your Claude Code Workflow

The developers who get the most out of Claude Code aren’t just fixing errors — they’re running multi-agent pipelines, using battle-tested CLAUDE.md templates, and shipping with production-grade operating principles.

Get Claude Code Mastery — included in Zovo Lifetime →

16 CLAUDE.md templates · 80+ prompts · orchestration configs · workflow playbooks. $99 once, free forever.