Per-Request Cost Tracking for Claude (2026)

Last updated: April 19, 2026

Every Claude API response includes a usage object with exact token counts. Multiply those counts by the model’s per-token price and you get the precise cost of that request – no estimation needed. The formula: (input_tokens x input_rate) + (output_tokens x output_rate) + (cache_tokens x cache_rate) + (web_searches x $0.01). For an Opus 4.7 request with 10,000 input and 2,000 output tokens, that’s $0.05 + $0.05 = $0.10 per request. Track this per-request and you’ll discover that 5% of your requests generate 40% of your bill.

The Setup

The usage object returned in every API response contains five key fields: input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, and server_tool_use (which includes web_search_requests, web_fetch_requests, and code_execution_requests). These fields map directly to billable units. Most developers log only the response content and discard the usage data. By preserving and calculating costs from the usage object on every request, you build a complete cost ledger that enables optimization at the individual request level.

The Math

The complete cost formula per request:

Total = (input_tokens * input_price / 1,000,000)
      + (output_tokens * output_price / 1,000,000)
      + (cache_creation_input_tokens * cache_write_price / 1,000,000)
      + (cache_read_input_tokens * cache_read_price / 1,000,000)
      + (web_search_requests * $0.01)
      + (code_execution_hours * $0.05)  [after 1,550 free hours/month]

Example with Sonnet 4.6 ($3.00 input, $15.00 output, $3.75 cache write, $0.30 cache read):

Field	Value	Calculation	Cost
input_tokens	5,000	5K x $3.00/MTok	$0.015000
output_tokens	1,500	1.5K x $15.00/MTok	$0.022500
cache_creation	10,000	10K x $3.75/MTok	$0.037500
cache_read	8,000	8K x $0.30/MTok	$0.002400
web_search	2	2 x $0.01	$0.020000
Total			$0.097400

Without cache reads, the same input would cost: 23,000 x $3.00/MTok = $0.069 for input alone. The cache saved $0.069 - $0.015 - $0.037500 - $0.002400 = $0.014100 on this single request.

The Technique

Build a per-request cost calculator that works with any Claude model.

import anthropic
from dataclasses import dataclass
from typing import Optional
# All prices per million tokens (verified 2026-04-19)
MODEL_PRICES = {
    "claude-opus-4-7": {
        "input": 5.00, "output": 25.00,
        "cache_write_5m": 6.25, "cache_write_1h": 10.00,
        "cache_read": 0.50,
        "batch_input": 2.50, "batch_output": 12.50
    },
    "claude-sonnet-4-6": {
        "input": 3.00, "output": 15.00,
        "cache_write_5m": 3.75, "cache_write_1h": 6.00,
        "cache_read": 0.30,
        "batch_input": 1.50, "batch_output": 7.50
    },
    "claude-haiku-4-5": {
        "input": 1.00, "output": 5.00,
        "cache_write_5m": 1.25, "cache_write_1h": 2.00,
        "cache_read": 0.10,
        "batch_input": 0.50, "batch_output": 2.50
    },
}
@dataclass
class RequestCost:
    input_cost: float
    output_cost: float
    cache_write_cost: float
    cache_read_cost: float
    search_cost: float
    total_cost: float
    input_tokens: int
    output_tokens: int
    def __str__(self):
        return (
            f"Input: {self.input_tokens:,} tokens (${self.input_cost:.4f}) | "
            f"Output: {self.output_tokens:,} tokens (${self.output_cost:.4f}) | "
            f"Cache: write ${self.cache_write_cost:.4f}, "
            f"read ${self.cache_read_cost:.4f} | "
            f"Search: ${self.search_cost:.4f} | "
            f"Total: ${self.total_cost:.4f}"
        )
def calculate_request_cost(
    model: str,
    usage,
    is_batch: bool = False,
    cache_ttl: str = "5m"
) -> RequestCost:
    """Calculate exact cost from a response's usage object."""
    prices = MODEL_PRICES.get(model)
    if not prices:
        raise ValueError(f"Unknown model: {model}")
    # Adjust for batch pricing (50% discount)
    input_price = prices["batch_input"] if is_batch else prices["input"]
    output_price = prices["batch_output"] if is_batch else prices["output"]
    cache_write_key = f"cache_write_{cache_ttl}"
    cache_write_price = prices.get(cache_write_key, prices["cache_write_5m"])
    input_cost = usage.input_tokens * input_price / 1_000_000
    output_cost = usage.output_tokens * output_price / 1_000_000
    cache_read_tokens = getattr(usage, "cache_read_input_tokens", 0) or 0
    cache_write_tokens = getattr(usage, "cache_creation_input_tokens", 0) or 0
    cache_write_cost = cache_write_tokens * cache_write_price / 1_000_000
    cache_read_cost = cache_read_tokens * prices["cache_read"] / 1_000_000
    server_tools = getattr(usage, "server_tool_use", None)
    web_searches = 0
    if server_tools:
        web_searches = getattr(server_tools, "web_search_requests", 0) or 0
    search_cost = web_searches * 0.01
    total = input_cost + output_cost + cache_write_cost + cache_read_cost + search_cost
    return RequestCost(
        input_cost=round(input_cost, 6),
        output_cost=round(output_cost, 6),
        cache_write_cost=round(cache_write_cost, 6),
        cache_read_cost=round(cache_read_cost, 6),
        search_cost=round(search_cost, 4),
        total_cost=round(total, 6),
        input_tokens=usage.input_tokens,
        output_tokens=usage.output_tokens,
    )
# Usage
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Explain prompt caching in detail"}]
)
cost = calculate_request_cost("claude-sonnet-4-6", response.usage)
print(cost)
# Input: 12 tokens ($0.0000) | Output: 847 tokens ($0.0127) |
# Cache: write $0.0000, read $0.0000 | Search: $0.0000 |
# Total: $0.0128

Once you have per-request costs, find your expensive outliers:

# Find the top 5% most expensive requests
costs_sorted = sorted(all_costs, key=lambda c: c.total_cost, reverse=True)
top_5_pct = costs_sorted[:len(costs_sorted) // 20]
print(f"Top 5% of requests cost: ${sum(c.total_cost for c in top_5_pct):.2f}")
print(f"Total spend: ${sum(c.total_cost for c in costs_sorted):.2f}")
print(f"Top 5% share: {sum(c.total_cost for c in top_5_pct) / sum(c.total_cost for c in costs_sorted):.0%}")

The Tradeoffs

Per-request tracking adds a small computational overhead to every API call (microseconds for the cost calculation, milliseconds for the database write). For latency-sensitive applications serving real-time user requests, batch the database writes using an in-memory buffer that flushes every 100 requests or 10 seconds. The pricing dictionary needs manual updates when Anthropic changes rates. Consider fetching prices from a config service rather than hardcoding them.

Implementation Checklist

Add the calculate_request_cost function to your API wrapper
Log every request’s cost with metadata (endpoint, user, feature)
Build a daily report of the top 20 most expensive requests
Identify patterns in expensive requests (large context, specific features)
Set per-request cost thresholds ($0.50, $1.00) and investigate requests exceeding them
Track cost distribution: what percentage of requests generate what percentage of spend

Measuring Impact

After one week of per-request tracking, generate a Pareto analysis. Typically, 10-20% of request types generate 60-80% of costs. These are your optimization targets. A common finding: long-running agentic sessions with growing context windows produce requests costing $0.50-$2.00 each, while 90% of requests cost under $0.05. Optimizing just the top 10% of requests (context windowing, model routing, caching) typically cuts total spend by 25-40%.

Which model? → Take the 5-question quiz in our Model Selector.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Per-Request Cost Tracking for Claude (2026)

The Setup

The Math

The Technique

The Tradeoffs

Implementation Checklist

Measuring Impact

See Also

About the Author

The Setup

The Math

The Technique

The Tradeoffs

Implementation Checklist

Measuring Impact

Related Guides

See Also

About the Author

Related Guides