Hidden token overhead for a typical request with 3 tools configured: Hidden Cost Tokens Per 10K Opus Requests Tool system prompt (auto) 346 $17.30 Bash tool definition 245 $12.25 Text editor ...

Why Your Claude Prompts Use Too Many (2026)

Last updated: April 19, 2026

Tool use definitions add up to 1,680 hidden tokens to every Claude API request — even when tools are not called. At Opus 4.7 pricing, that is $8.40 wasted per 1,000 requests. Combined with verbose system prompts, redundant context, and uncontrolled output, most developers waste 40-60% of their token budget without realizing it.

The Setup

Developers focus on the user message when thinking about tokens. But the user message is often the smallest part of the prompt. The real token consumers are system prompts, tool definitions, conversation history, and output verbosity.

Anthropic’s documentation confirms specific token overhead numbers: tool use adds 346 system prompt tokens automatically when configured, each bash tool call adds 245 tokens, the text editor tool adds 700 tokens, and computer use adds 735 tokens. These costs are invisible in your application code but visible on your invoice.

This guide exposes the five most common token drains and shows how to eliminate each one.

The Math

Hidden token overhead for a typical request with 3 tools configured:

Hidden Cost	Tokens	Per 10K Opus Requests
Tool system prompt (auto)	346	$17.30
Bash tool definition	245	$12.25
Text editor definition	700	$35.00
Computer use definition	735	$36.75
Total tool overhead	2,026	$101.30

Add a 2,000-token verbose system prompt: $100.00 per 10K requests

Total hidden cost: $201.30 per 10,000 requests on Opus 4.7

At 300,000 requests/month: $6,039/month in hidden overhead alone

After optimization (conditional tools, compressed prompt): $1,206/month

Savings: $4,833/month (80%)

The Technique

Drain 1: Tool Definitions Included When Not Needed

import anthropic
client = anthropic.Anthropic()
# BAD: Tools defined on every request (adds 2,000+ tokens)
tools = [
    {"name": "search", "description": "Search the database for records matching a query",
     "input_schema": {"type": "object", "properties": {"query": {"type": "string"}, "limit": {"type": "integer"}}}},
    {"name": "update", "description": "Update a database record with new values",
     "input_schema": {"type": "object", "properties": {"id": {"type": "string"}, "fields": {"type": "object"}}}},
    {"name": "delete", "description": "Delete a record from the database",
     "input_schema": {"type": "object", "properties": {"id": {"type": "string"}}}},
]
# Every request pays the token cost even for "What time is it?"
response = client.messages.create(
    model="claude-sonnet-4-6", max_tokens=100,
    tools=tools,
    messages=[{"role": "user", "content": "What time is it?"}],
)
# GOOD: Only include tools when the request might need them
def needs_tools(prompt: str) -> bool:
    tool_signals = ["search for", "find", "update", "delete", "look up", "modify"]
    return any(signal in prompt.lower() for signal in tool_signals)
user_prompt = "What time is it?"
kwargs = {"model": "claude-sonnet-4-6", "max_tokens": 100,
          "messages": [{"role": "user", "content": user_prompt}]}
if needs_tools(user_prompt):
    kwargs["tools"] = tools
response = client.messages.create(**kwargs)
# Saved 2,000+ tokens on this request

Drain 2: Conversation History Never Pruned

def prune_history(messages: list, max_tokens: int = 10000, keep_last: int = 4) -> list:
    """Keep recent messages and prune old ones to control context size."""
    if len(messages) <= keep_last:
        return messages
    # Always keep the last N messages for continuity
    recent = messages[-keep_last:]
    # Estimate tokens in remaining messages
    older = messages[:-keep_last]
    total_chars = sum(len(m["content"]) for m in older)
    estimated_tokens = total_chars // 4  # rough char-to-token ratio

    if estimated_tokens <= max_tokens:
        return messages  # fits within budget

    # Summarize older context instead of including verbatim
    summary_prompt = "Summarize the key points from this conversation:\n"
    for m in older:
        summary_prompt += f"{m['role']}: {m['content'][:200]}\n"
    return [
        {"role": "user", "content": f"[Previous context summary: {summary_prompt[:500]}]"},
        {"role": "assistant", "content": "Understood, I have the context."},
        *recent,
    ]

Drain 3: Output Verbosity Not Controlled

# BAD: No output constraints (Claude may produce 2,000+ tokens)
response = client.messages.create(
    model="claude-sonnet-4-6", max_tokens=4096,
    messages=[{"role": "user", "content": "What does the map function do in Python?"}],
)
# May get a 500-word essay when you needed 2 sentences

# GOOD: Constrain output explicitly
response = client.messages.create(
    model="claude-sonnet-4-6", max_tokens=200,
    system="Answer in 1-2 sentences. No examples unless asked.",
    messages=[{"role": "user", "content": "What does the map function do in Python?"}],
)
# Output: ~50 tokens instead of ~500
# Savings: 450 output tokens * $15/MTok = $0.00675 per request

Drain 4: Redundant Instructions

# BAD: Saying the same thing multiple ways
verbose = """Be concise. Keep your answers short. Don't write long responses.
Avoid unnecessary details. Get straight to the point. Be brief."""
# 6 ways to say "be concise" = 5 wasted instructions

# GOOD: Say it once
concise = "Max 3 sentences per response."

Drain 5: Including Full Documents When Snippets Suffice

# BAD: Sending entire 100KB document for a specific question
with open("large_document.txt") as f:
    full_doc = f.read()  # ~25,000 tokens

# GOOD: Extract relevant section first
def extract_relevant_section(doc: str, question: str, window: int = 2000) -> str:
    """Find the most relevant section of a document for a question."""
    keywords = question.lower().split()
    paragraphs = doc.split("\n\n")
    scored = []
    for i, para in enumerate(paragraphs):
        score = sum(1 for kw in keywords if kw in para.lower())
        scored.append((score, i, para))
    scored.sort(reverse=True)
    # Return top 3 most relevant paragraphs
    top = sorted(scored[:3], key=lambda x: x[1])
    return "\n\n".join(p[2] for p in top)
relevant = extract_relevant_section(full_doc, "What is the refund policy?")
# ~2,000 tokens instead of 25,000

The Tradeoffs

Aggressive history pruning can cause Claude to lose important conversation context, producing responses that feel disconnected. Keep the last 4-6 messages intact and only prune or summarize older history.

Conditional tool loading requires maintaining a routing function that correctly predicts when tools are needed. False negatives (not loading tools when needed) cause request failures. Start with a permissive classifier and tighten over time.

Output length constraints may truncate legitimate long responses. Set max_tokens to the 95th percentile of your actual output distribution, not an arbitrary low number.

Implementation Checklist

Count tokens in each component of your top 5 request types
Remove tool definitions from requests that never use tools
Add conversation history pruning at 10,000 tokens
Set explicit output length constraints in system prompts
Remove duplicate and redundant instructions
Replace full documents with extracted relevant sections
Measure token reduction and validate quality

Measuring Impact

Track average input tokens per request broken down by component (system, tools, history, message). Identify which component decreased most after each optimization. Set a target of reducing total input tokens by 40-50%. Monitor your Anthropic billing dashboard weekly to confirm the token reduction translates to actual cost savings.

Estimate usage → Calculate your token consumption with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Claude Code Token Usage Optimization — comprehensive optimization guide
Reduce Claude Code Hallucinations Save Tokens — clearer prompts waste fewer tokens
Claude Skill Token Usage Profiling — identify per-skill token waste

Why Your Claude Prompts Use Too Many (2026)

The Setup

The Math

The Technique

Drain 1: Tool Definitions Included When Not Needed

Drain 2: Conversation History Never Pruned

Drain 3: Output Verbosity Not Controlled

Drain 4: Redundant Instructions

Drain 5: Including Full Documents When Snippets Suffice

The Tradeoffs

Implementation Checklist

Measuring Impact

See Also

About the Author

The Setup

The Math

The Technique

Drain 1: Tool Definitions Included When Not Needed

Drain 2: Conversation History Never Pruned

Drain 3: Output Verbosity Not Controlled

Drain 4: Redundant Instructions

Drain 5: Including Full Documents When Snippets Suffice

The Tradeoffs

Implementation Checklist

Measuring Impact

Related Guides

See Also

About the Author

Related Guides