Per-Request Cost Tracking for Claude API

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Every Claude API response includes a usage object with exact token counts. Multiply those counts by the model’s per-token price and you get the precise cost of that request – no estimation needed. The formula: (input_tokens x input_rate) + (output_tokens x output_rate) + (cache_tokens x cache_rate) + (web_searches x $0.01). For an Opus 4.7 request with 10,000 input and 2,000 output tokens, that’s $0.05 + $0.05 = $0.10 per request. Track this per-request and you’ll discover that 5% of your requests generate 40% of your bill.

The Setup

The usage object returned in every API response contains five key fields: input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, and server_tool_use (which includes web_search_requests, web_fetch_requests, and code_execution_requests). These fields map directly to billable units. Most developers log only the response content and discard the usage data. By preserving and calculating costs from the usage object on every request, you build a complete cost ledger that enables optimization at the individual request level.

The Math

The complete cost formula per request:

Total = (input_tokens * input_price / 1,000,000)
      + (output_tokens * output_price / 1,000,000)
      + (cache_creation_input_tokens * cache_write_price / 1,000,000)
      + (cache_read_input_tokens * cache_read_price / 1,000,000)
      + (web_search_requests * $0.01)
      + (code_execution_hours * $0.05)  [after 1,550 free hours/month]

Example with Sonnet 4.6 ($3.00 input, $15.00 output, $3.75 cache write, $0.30 cache read):

Field Value Calculation Cost
input_tokens 5,000 5K x $3.00/MTok $0.015000
output_tokens 1,500 1.5K x $15.00/MTok $0.022500
cache_creation 10,000 10K x $3.75/MTok $0.037500
cache_read 8,000 8K x $0.30/MTok $0.002400
web_search 2 2 x $0.01 $0.020000
Total     $0.097400

Without cache reads, the same input would cost: 23,000 x $3.00/MTok = $0.069 for input alone. The cache saved $0.069 - $0.015 - $0.037500 - $0.002400 = $0.014100 on this single request.

The Technique

Build a per-request cost calculator that works with any Claude model.

import anthropic
from dataclasses import dataclass
from typing import Optional

# All prices per million tokens (verified 2026-04-19)
MODEL_PRICES = {
    "claude-opus-4-7": {
        "input": 5.00, "output": 25.00,
        "cache_write_5m": 6.25, "cache_write_1h": 10.00,
        "cache_read": 0.50,
        "batch_input": 2.50, "batch_output": 12.50
    },
    "claude-sonnet-4-6": {
        "input": 3.00, "output": 15.00,
        "cache_write_5m": 3.75, "cache_write_1h": 6.00,
        "cache_read": 0.30,
        "batch_input": 1.50, "batch_output": 7.50
    },
    "claude-haiku-4-5": {
        "input": 1.00, "output": 5.00,
        "cache_write_5m": 1.25, "cache_write_1h": 2.00,
        "cache_read": 0.10,
        "batch_input": 0.50, "batch_output": 2.50
    },
}

@dataclass
class RequestCost:
    input_cost: float
    output_cost: float
    cache_write_cost: float
    cache_read_cost: float
    search_cost: float
    total_cost: float
    input_tokens: int
    output_tokens: int

    def __str__(self):
        return (
            f"Input: {self.input_tokens:,} tokens (${self.input_cost:.4f}) | "
            f"Output: {self.output_tokens:,} tokens (${self.output_cost:.4f}) | "
            f"Cache: write ${self.cache_write_cost:.4f}, "
            f"read ${self.cache_read_cost:.4f} | "
            f"Search: ${self.search_cost:.4f} | "
            f"Total: ${self.total_cost:.4f}"
        )


def calculate_request_cost(
    model: str,
    usage,
    is_batch: bool = False,
    cache_ttl: str = "5m"
) -> RequestCost:
    """Calculate exact cost from a response's usage object."""
    prices = MODEL_PRICES.get(model)
    if not prices:
        raise ValueError(f"Unknown model: {model}")

    # Adjust for batch pricing (50% discount)
    input_price = prices["batch_input"] if is_batch else prices["input"]
    output_price = prices["batch_output"] if is_batch else prices["output"]

    cache_write_key = f"cache_write_{cache_ttl}"
    cache_write_price = prices.get(cache_write_key, prices["cache_write_5m"])

    input_cost = usage.input_tokens * input_price / 1_000_000
    output_cost = usage.output_tokens * output_price / 1_000_000

    cache_read_tokens = getattr(usage, "cache_read_input_tokens", 0) or 0
    cache_write_tokens = getattr(usage, "cache_creation_input_tokens", 0) or 0
    cache_write_cost = cache_write_tokens * cache_write_price / 1_000_000
    cache_read_cost = cache_read_tokens * prices["cache_read"] / 1_000_000

    server_tools = getattr(usage, "server_tool_use", None)
    web_searches = 0
    if server_tools:
        web_searches = getattr(server_tools, "web_search_requests", 0) or 0
    search_cost = web_searches * 0.01

    total = input_cost + output_cost + cache_write_cost + cache_read_cost + search_cost

    return RequestCost(
        input_cost=round(input_cost, 6),
        output_cost=round(output_cost, 6),
        cache_write_cost=round(cache_write_cost, 6),
        cache_read_cost=round(cache_read_cost, 6),
        search_cost=round(search_cost, 4),
        total_cost=round(total, 6),
        input_tokens=usage.input_tokens,
        output_tokens=usage.output_tokens,
    )


# Usage
client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Explain prompt caching in detail"}]
)

cost = calculate_request_cost("claude-sonnet-4-6", response.usage)
print(cost)
# Input: 12 tokens ($0.0000) | Output: 847 tokens ($0.0127) |
# Cache: write $0.0000, read $0.0000 | Search: $0.0000 |
# Total: $0.0128

Once you have per-request costs, find your expensive outliers:

# Find the top 5% most expensive requests
costs_sorted = sorted(all_costs, key=lambda c: c.total_cost, reverse=True)
top_5_pct = costs_sorted[:len(costs_sorted) // 20]
print(f"Top 5% of requests cost: ${sum(c.total_cost for c in top_5_pct):.2f}")
print(f"Total spend: ${sum(c.total_cost for c in costs_sorted):.2f}")
print(f"Top 5% share: {sum(c.total_cost for c in top_5_pct) / sum(c.total_cost for c in costs_sorted):.0%}")

The Tradeoffs

Per-request tracking adds a small computational overhead to every API call (microseconds for the cost calculation, milliseconds for the database write). For latency-sensitive applications serving real-time user requests, batch the database writes using an in-memory buffer that flushes every 100 requests or 10 seconds. The pricing dictionary needs manual updates when Anthropic changes rates. Consider fetching prices from a config service rather than hardcoding them.

Implementation Checklist

Measuring Impact

After one week of per-request tracking, generate a Pareto analysis. Typically, 10-20% of request types generate 60-80% of costs. These are your optimization targets. A common finding: long-running agentic sessions with growing context windows produce requests costing $0.50-$2.00 each, while 90% of requests cost under $0.05. Optimizing just the top 10% of requests (context windowing, model routing, caching) typically cuts total spend by 25-40%.