Fix: Claude Code High Token Usage

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

The Error

You ask Claude Code a few simple questions and discover it consumed far more tokens than expected. Simple queries should not cost much, but context accumulation makes each subsequent message more expensive.

Quick Fix

  1. Check your token usage: /cost in Claude Code
  2. Use /compact to reduce context size
  3. Switch to a smaller model for simple queries: /model sonnet

What Causes This

Token consumption in Claude Code is driven by input tokens (the full conversation context sent with every request), not just output tokens.

1. Full Context Sent Every Request

Every message you send includes the entire conversation history plus the system prompt plus any file contents that have been read in the session. Each subsequent message sends everything again. If Claude read files or generated long responses, the context grows rapidly.

2. Extended Thinking Tokens

Extended thinking is enabled by default because it improves performance on complex planning and reasoning tasks. Thinking tokens are billed as output tokens, and the default budget can be tens of thousands of tokens per request depending on the model. For simpler tasks where deep reasoning is not needed, this is wasted spend.

3. Model Pricing

Token costs vary significantly by model:

Model Input (per MTok) Output (per MTok)
Opus 4.6 $5 $25
Sonnet 4.6 $3 $15
Haiku 4.5 $1 $5

Using Opus for simple questions costs significantly more than Sonnet or Haiku.

Full Solution

1. Check Current Usage

/cost

This shows your token consumption for the current session. For API users, this reflects actual billing. For Max and Pro subscribers, use /stats to view usage patterns instead.

2. Compact Regularly

/compact

This summarizes the conversation history to reduce the context sent with each message. You can add focus instructions:

/compact Focus on code samples and API usage

Run /compact after every major task completion or when switching topics.

3. Use the Right Model

For simple questions that do not require deep reasoning:

/model sonnet

Sonnet handles most coding tasks well at lower cost. Reserve Opus for complex architectural decisions or multi-step reasoning.

4. Control Thinking Budget

Reduce the effort level for simple queries:

/effort low

The /effort command accepts low, medium, high, max, and auto. The low, medium, and high settings persist across sessions. You can also set the MAX_THINKING_TOKENS environment variable to cap thinking token usage:

export MAX_THINKING_TOKENS=8000

5. Start New Sessions for New Topics

Instead of continuing a long conversation that has accumulated context:

# Start a new session for a new topic
claude

# Use /clear to reset context without exiting
/clear

Use /clear when switching to unrelated work. Stale context wastes tokens on every subsequent message.

6. Minimize File Reads

Be specific about what you want Claude to look at:

# Instead of: "Look at my project and tell me about the architecture"
# Try: "Read src/index.ts and explain the main function"

Each file read adds its entire content to the conversation context for all subsequent messages.

7. Use Subagents for Verbose Operations

Running tests, fetching documentation, or processing log files can consume significant context. Delegate these to subagents so the verbose output stays in the subagent’s context while only a summary returns to your main conversation.

Prevention