Tool Use vs Direct Prompting Cost Comparison

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Tool use adds a minimum of 659 tokens (346 system overhead + 313 for a single minimal tool) to every request. With a typical tool definition of 400 tokens, that’s 746 tokens of overhead. At Opus 4.7 rates ($5.00/MTok), you’re paying $0.00373 per request just for the privilege of having tools available. If the model can accomplish the same task via direct prompting – generating structured output without tool calls – you save that overhead entirely. Across 10,000 daily requests, that’s $37.30/day or $1,119/month.

The Setup

Not every task that returns structured data needs tool use. Claude can output valid JSON, XML, CSV, or any structured format through direct prompting with output constraints. Tool use is essential when the model needs to execute external actions (database queries, API calls, file operations), but many “tool use” implementations actually just extract structured data from text – something Claude does natively. The cost difference matters because tool use inflates every request with system overhead tokens, tool definition tokens, and tool_use/tool_result block tokens. Direct prompting carries none of this overhead.

The Math

Compare extracting order details from an email using both approaches:

Tool use approach (Sonnet 4.6, $3.00/$15.00 per MTok):

Component Tokens Cost
System overhead 346 $0.001038
extract_order tool definition 450 $0.001350
Email text (input) 800 $0.002400
Prompt instructions 150 $0.000450
Total input 1,746 $0.005238
tool_use response block 250 $0.003750
Total 1,996 $0.008988

Direct prompting approach (same model):

Component Tokens Cost
System prompt 100 $0.000300
Email text (input) 800 $0.002400
Prompt instructions 200 $0.000600
Total input 1,100 $0.003300
JSON response 200 $0.003000
Total 1,300 $0.006300

Savings per request: $0.002688 (30%) At 10,000 requests/day: $806/month

For Haiku 4.5 ($1.00/$5.00 per MTok), the same comparison:

The Technique

Replace tool use with structured output prompting for data extraction tasks.

import anthropic
import json

client = anthropic.Anthropic()

# APPROACH 1: Tool use (adds ~800 tokens overhead)
def extract_with_tools(email_text: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        tools=[{
            "name": "extract_order",
            "description": "Extract order details from email",
            "input_schema": {
                "type": "object",
                "properties": {
                    "order_id": {"type": "string"},
                    "customer_name": {"type": "string"},
                    "items": {
                        "type": "array",
                        "items": {
                            "type": "object",
                            "properties": {
                                "name": {"type": "string"},
                                "quantity": {"type": "integer"},
                                "price": {"type": "number"}
                            }
                        }
                    },
                    "total": {"type": "number"}
                },
                "required": ["order_id", "customer_name", "items", "total"]
            }
        }],
        tool_choice={"type": "tool", "name": "extract_order"},
        messages=[{"role": "user", "content": email_text}]
    )

    for block in response.content:
        if block.type == "tool_use":
            return block.input
    return {}


# APPROACH 2: Direct prompting (zero tool overhead)
def extract_with_prompt(email_text: str) -> dict:
    response = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=500,
        messages=[{
            "role": "user",
            "content": f"""Extract order details from this email as JSON.
Return ONLY valid JSON with these fields:
- order_id (string)
- customer_name (string)
- items (array of objects with name, quantity, price)
- total (number)

Email:
{email_text}"""
        }]
    )
    return json.loads(response.content[0].text)


# DECISION FUNCTION: When to use which approach
def extract_order(email_text: str, needs_validation: bool = False) -> dict:
    """
    Use tool use only when you need schema validation.
    Use direct prompting for simple extraction.
    """
    if needs_validation:
        return extract_with_tools(email_text)
    return extract_with_prompt(email_text)

Use this decision matrix to pick the right approach:

Scenario Use Tool Use Use Direct Prompting
Extract structured data from text No Yes
Call an external API Yes No
Execute database queries Yes No
Format conversion (text to JSON) No Yes
Multi-step workflows with side effects Yes No
Classification/categorization output No Yes
File system operations Yes No

The Tradeoffs

Direct prompting doesn’t enforce schema validation. The model might return malformed JSON, missing fields, or incorrect types. Tool use provides implicit schema validation – if the model tries to call a tool with invalid parameters, you get a clear error. For production systems processing thousands of requests, the reliability of tool use might justify the token overhead. A middle ground: use direct prompting with a JSON parsing step that retries on validation failure. The retry costs less than carrying tool overhead on every request when failures are rare (under 2% typically).

Implementation Checklist

Measuring Impact

Compare usage.input_tokens for identical tasks processed via both paths. The tool use path should consistently show 500-1,500 more input tokens. Track extraction accuracy on both paths using a ground truth dataset. If direct prompting achieves 98%+ accuracy (matching tool use), the conversion is worthwhile. At Sonnet 4.6 rates, converting 5,000 daily requests from tool use to direct prompting saves approximately $400/month with no quality loss for well-defined extraction tasks.