How Tool Definitions Add 346 Tokens Per Call

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Send a Claude API request with tools: [] and your input token count jumps by 346 tokens before you type a single word. That’s the system prompt overhead Anthropic injects whenever tool use is active. At Sonnet 4.6’s $3.00 per million input tokens, those 346 tokens cost $0.001038 per request. Across 50,000 requests per day, you’re paying $51.90 daily – $1,557 monthly – for tokens you never wrote.

The Setup

When you pass a tools parameter in your API request, Claude adds an internal system prompt that explains to the model how to format tool calls. The size depends on the tool_choice setting: auto or none adds 346 tokens, while any or a specific tool name adds 313 tokens. This overhead exists on every request regardless of whether a tool is actually invoked. It’s billed at the standard input token rate for whatever model you’re using. The overhead is separate from and additive to the tokens consumed by individual tool definitions (names, descriptions, and JSON schemas).

The Math

Compare identical requests with and without tool use enabled on Opus 4.7 ($5.00/MTok input):

Without tools:

With tools (3 tools, ~300 tokens each):

The tool overhead alone (1,246 tokens) costs $0.00623 – a 415% increase per request.

At 10,000 requests/day over a month:

The Technique

Measure and eliminate this overhead with a two-step approach: first audit, then conditionally disable.

import anthropic
import json

client = anthropic.Anthropic()

def measure_tool_overhead():
    """Compare token usage with and without tools."""

    base_message = [{"role": "user", "content": "What is 2+2?"}]

    # Request WITHOUT tools
    resp_no_tools = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=50,
        messages=base_message
    )

    # Request WITH tools
    simple_tool = [{
        "name": "calculator",
        "description": "Calculate",
        "input_schema": {
            "type": "object",
            "properties": {"expr": {"type": "string"}},
            "required": ["expr"]
        }
    }]

    resp_with_tools = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=50,
        tools=simple_tool,
        messages=base_message
    )

    overhead = (resp_with_tools.usage.input_tokens
                - resp_no_tools.usage.input_tokens)

    print(f"Without tools: {resp_no_tools.usage.input_tokens} input tokens")
    print(f"With tools:    {resp_with_tools.usage.input_tokens} input tokens")
    print(f"Tool overhead: {overhead} tokens")
    print(f"Overhead cost at Opus $5/MTok: ${overhead * 5 / 1_000_000:.6f}")
    return overhead


def smart_request(user_message: str, needs_tools: bool = False):
    """Only include tools when actually needed."""
    kwargs = {
        "model": "claude-sonnet-4-6",
        "max_tokens": 1024,
        "messages": [{"role": "user", "content": user_message}]
    }

    if needs_tools:
        kwargs["tools"] = load_tools()  # your tool definitions

    return client.messages.create(**kwargs)


def load_tools():
    """Load minimal tool definitions from config."""
    with open("tools.json") as f:
        return json.load(f)

The key insight: if your application handles both tool-use and non-tool-use requests, split them into separate code paths. Only attach the tools parameter when the request genuinely requires function calling. For classification, summarization, or generation tasks that never need tools, omit the parameter entirely and save 346+ tokens per call.

For requests that do need tools, minimize definition size. Strip unnecessary descriptions, remove optional schema properties the model rarely uses, and use $ref to avoid duplicating shared sub-schemas.

The Tradeoffs

Splitting tool and non-tool code paths adds routing complexity. You need a reliable way to determine which requests need tools before sending them. Misrouting a tool-needing request to the non-tool path means the model cannot call any function. Also, some conversation flows start without tools but later require them – you’ll need to handle mid-conversation tool injection, which means managing message history across both modes. One practical concern: the tool_choice parameter itself affects overhead. Using any instead of auto saves 33 tokens per request (313 vs. 346), which at Opus 4.7 rates equals $0.000165 per request. That’s $49.50/month at 10,000 requests/day – a small win but entirely free to implement. If you know every request needs exactly one specific tool, use tool_choice: {"type": "tool", "name": "your_tool"} to get the 313-token overhead instead of 346.

When combining tool overhead reduction with batch processing, the savings stack. Batch mode cuts input token costs by 50%, reducing Opus 4.7 from $5.00/MTok to $2.50/MTok. The 346-token overhead at batch rates costs $0.000865 per request instead of $0.00173. At 50,000 daily requests, batch mode alone saves $43.25/day on the system overhead tokens. Combined with splitting tool and non-tool paths, a team processing 50,000 daily requests could save $2,592/month on overhead tokens alone by batching non-tool requests and omitting the tools parameter entirely.

Implementation Checklist

Measuring Impact

Track two metrics: average input tokens per request on the tool path vs. the non-tool path, and the percentage of requests routed to each. Multiply the delta by your model’s input price. If you route 40% of 50,000 daily requests away from tools, saving 1,246 tokens each at Sonnet 4.6 rates ($3.00/MTok), that’s 20,000 x 1,246 x $3.00/MTok = $74.76 per day, or $2,243 per month.