Typical unoptimized request on Opus 4.7: System prompt: 2,000 tokens Tool definitions: 1,500 tokens (3 tools) User message: 3,000 tokens Total input: 6,500 tokens -> $0.0325/request After optimization: System prompt: 500 tokens (75% reduction) Tool definitions: 500 tokens (1 to...

How to Reduce Claude API Token Usage (2026)

Last updated: April 19, 2026

A production Claude Opus 4.7 pipeline processing 10,000 requests per day with a 2,000-token system prompt spends $100/day on system prompt tokens alone. Compressing that prompt to 500 tokens cuts daily input costs by $75 — saving $2,250/month on a single optimization.

The Setup

Token usage directly determines your Claude API bill. Every input token costs $5.00 per million on Opus 4.7, $3.00 on Sonnet 4.6, or $1.00 on Haiku 4.5. Every output token costs 5x the input rate.

Most developers write prompts for clarity, not efficiency. Verbose instructions, unused tool definitions, and bloated system prompts silently inflate every request. This guide covers five concrete techniques to cut token usage by 50% without reducing output quality.

Target: developers spending $500-$5,000/month on Claude API who have not optimized their prompts for token efficiency.

The Math

Typical unoptimized request on Opus 4.7:

System prompt: 2,000 tokens
Tool definitions: 1,500 tokens (3 tools)
User message: 3,000 tokens
Total input: 6,500 tokens -> $0.0325/request

After optimization:

System prompt: 500 tokens (75% reduction)
Tool definitions: 500 tokens (1 tool, only when needed)
User message: 2,500 tokens (structured formatting)
Total input: 3,500 tokens -> $0.0175/request

Savings per request: $0.015 (46%)

At 10,000 requests/day: $150/day saved -> $4,500/month

Output optimization (constrain max tokens, request concise responses):

Before: average 1,500 output tokens -> $0.0375
After: average 800 output tokens -> $0.020
Additional savings: $0.0175/request -> $5,250/month

Combined savings: $9,750/month (both input and output)

The Technique

1. Compress System Prompts

# BEFORE: 2,000+ tokens
VERBOSE_SYSTEM = """
You are a helpful AI assistant specialized in analyzing customer support tickets.
Your role is to carefully read each ticket, understand the customer's issue,
and provide a classification that our support team can use to route the ticket
to the appropriate department. You should consider the following categories
when classifying tickets: billing issues, technical problems, account access,
feature requests, and general inquiries. Please be thorough in your analysis
and provide a confidence score along with your classification. It is important
that you respond in a structured format so that our automated systems can
parse your response correctly. Always include the category, confidence score,
and a brief explanation of why you chose that category.
"""
# AFTER: 500 tokens — same behavior, fewer tokens
LEAN_SYSTEM = """Classify support tickets into: billing, technical, access, feature_request, general.
Output format: {"category": "...", "confidence": 0.0-1.0, "reason": "one sentence"}
Output only valid JSON. No preamble."""

2. Remove Unused Tool Definitions

Each tool definition adds tokens to every request. The bash tool adds 245 tokens, the text editor adds 700 tokens, and computer use adds 735 tokens per call.

import anthropic
client = anthropic.Anthropic()
# BAD: Including tools you don't need
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    tools=[
        {"name": "search_db", "description": "Search the database", "input_schema": {"type": "object", "properties": {"query": {"type": "string"}}}},
        {"name": "send_email", "description": "Send an email", "input_schema": {"type": "object", "properties": {"to": {"type": "string"}, "body": {"type": "string"}}}},
        {"name": "update_record", "description": "Update a record", "input_schema": {"type": "object", "properties": {"id": {"type": "string"}, "data": {"type": "object"}}}},
    ],
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)
# Wasted ~1,000+ tokens on tool definitions for a simple question

# GOOD: Only include tools when the task needs them
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

3. Structure User Messages for Token Efficiency

# BEFORE: Narrative style (~800 tokens)
verbose_prompt = """
I have a JSON file that contains customer data. Each customer has a name,
email, phone number, and subscription status. I need you to go through
the data and find all customers whose subscription has expired. For each
expired customer, I need their name and email address. The data is as follows:
[paste large JSON block here]
"""
# AFTER: Structured style (~400 tokens)
lean_prompt = """Extract expired subscriptions from this data.
Return: name, email for each.
Output: JSON array.
Data:
[paste JSON block]"""

4. Constrain Output Length

# Request concise output explicitly
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=200,  # Hard cap prevents runaway output
    system="Respond in under 100 words. No preamble. No summary.",
    messages=[{"role": "user", "content": "Explain what a mutex does in concurrent programming."}],
)

5. Use Abbreviations in Structured Prompts

# BEFORE: 150 tokens
system = "When analyzing code, check for: security vulnerabilities, performance issues, maintainability concerns, error handling gaps, and documentation completeness."
# AFTER: 50 tokens
system = "Code analysis checklist: security, perf, maintainability, error handling, docs."

The Tradeoffs

Aggressive prompt compression can reduce output quality. If you compress too far, Claude may misinterpret instructions or produce less accurate results. Always A/B test compressed prompts against originals on 50+ representative inputs.

Setting max_tokens too low causes truncated responses. Start by measuring your actual output token distribution, then set max_tokens to the 95th percentile — this prevents waste without cutting off legitimate responses.

Removing tool definitions when tools might be needed causes failures. Only prune tools when you are certain the request type never requires them.

Implementation Checklist

Measure current token usage per request type (input and output separately)
Compress your top 3 system prompts by removing redundant instructions
Audit tool definitions — remove any not used by the current request type
Add max_tokens constraints based on actual output length distribution
A/B test compressed prompts against originals for quality
Deploy optimized prompts and monitor token usage reduction
Repeat monthly as your prompt library evolves

Measuring Impact

Compare average tokens per request before and after optimization. Track four metrics: average input tokens per request (target: 40-60% reduction), average output tokens per request (target: 20-40% reduction), total monthly token consumption, and quality score on a held-out test set (target: less than 3% degradation). Use the Anthropic usage dashboard to verify token counts are dropping as expected.

Which model? → Take the 5-question quiz in our Model Selector.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Claude Code Token Usage Optimization — comprehensive token optimization strategies
Reduce Claude Code Hallucinations Save Tokens — write better prompts that use fewer tokens
Claude Skill Token Usage Profiling — measure per-skill token consumption

How to Reduce Claude API Token Usage (2026)

The Setup

The Math

The Technique

1. Compress System Prompts

2. Remove Unused Tool Definitions

3. Structure User Messages for Token Efficiency

4. Constrain Output Length

5. Use Abbreviations in Structured Prompts

The Tradeoffs

Implementation Checklist

Measuring Impact

See Also

About the Author

The Setup

The Math

The Technique

1. Compress System Prompts

2. Remove Unused Tool Definitions

3. Structure User Messages for Token Efficiency

4. Constrain Output Length

5. Use Abbreviations in Structured Prompts

The Tradeoffs

Implementation Checklist

Measuring Impact

Related Guides

See Also

About the Author

Related Guides