Claude API Usage Metrics Every Team Needs
Most teams track one metric for Claude API usage: total monthly spend. That’s like monitoring a car’s fuel bill without checking MPG, tire pressure, or engine temperature. The seven metrics that actually drive cost optimization are: cost per request, tokens per dollar, cache hit rate, model distribution ratio, output-to-input ratio, error rate cost, and time-to-throttle. Together, they paint a picture that turns a $5,000/month bill into a $3,000/month bill.
The Setup
The Claude API response’s usage object provides the raw data for all seven metrics. Each response contains input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, and server_tool_use. Combined with the model identifier and your pricing table, these fields generate every metric a team needs. The challenge isn’t data availability – it’s the discipline of collecting, aggregating, and acting on the data consistently. Teams that review these metrics weekly cut costs 25-40% faster than those reviewing monthly.
The Math
How each metric translates to dollar savings:
Metric 1: Cost per request
- Baseline: $0.05/request (Sonnet 4.6, 5K input, 1.5K output)
- After optimization: $0.02/request (model routing + caching)
- At 20,000 requests/day: saves $600/day = $18,000/month
Metric 2: Cache hit rate
- Current: 30% cache hits
- Target: 70% cache hits
- Impact: 40% more cached reads at $0.30/MTok vs $3.00/MTok (Sonnet)
- On 10K daily requests with 20K shared tokens each: saves $216/day = $6,480/month
Metric 3: Model distribution
- Current: 100% Opus 4.7 ($5.00/$25.00)
- Optimal: 30% Opus, 50% Sonnet ($3.00/$15.00), 20% Haiku ($1.00/$5.00)
- On blended 10K daily requests: saves from $0.10 avg to $0.055 avg = $450/day = $13,500/month
The Technique
Build a metrics collector that computes all seven metrics from API responses.
import anthropic
from dataclasses import dataclass, field
from collections import defaultdict
from datetime import datetime
PRICING = {
"claude-opus-4-7": {"input": 5.00, "output": 25.00, "cache_read": 0.50},
"claude-sonnet-4-6": {"input": 3.00, "output": 15.00, "cache_read": 0.30},
"claude-haiku-4-5": {"input": 1.00, "output": 5.00, "cache_read": 0.10},
}
@dataclass
class MetricsCollector:
requests: int = 0
total_cost: float = 0.0
total_input_tokens: int = 0
total_output_tokens: int = 0
cache_eligible: int = 0
cache_hits: int = 0
errors: int = 0
error_cost: float = 0.0
model_counts: dict = field(default_factory=lambda: defaultdict(int))
costs_list: list = field(default_factory=list)
def record(self, model: str, usage, is_error: bool = False):
"""Record metrics from a single API response."""
prices = PRICING.get(model, PRICING["claude-sonnet-4-6"])
cost = (
usage.input_tokens * prices["input"] / 1_000_000
+ usage.output_tokens * prices["output"] / 1_000_000
)
cache_read = getattr(usage, "cache_read_input_tokens", 0) or 0
if cache_read > 0:
cost += cache_read * prices["cache_read"] / 1_000_000
self.requests += 1
self.total_cost += cost
self.total_input_tokens += usage.input_tokens
self.total_output_tokens += usage.output_tokens
self.model_counts[model] += 1
self.costs_list.append(cost)
# Cache tracking
if usage.input_tokens > 5000: # likely cache-eligible
self.cache_eligible += 1
if cache_read > 0:
self.cache_hits += 1
if is_error:
self.errors += 1
self.error_cost += cost
def report(self) -> dict:
"""Generate the 7 essential metrics."""
if self.requests == 0:
return {"error": "No data collected"}
avg_cost = self.total_cost / self.requests
total_tokens = self.total_input_tokens + self.total_output_tokens
tokens_per_dollar = total_tokens / self.total_cost if self.total_cost else 0
cache_rate = (self.cache_hits / self.cache_eligible * 100
if self.cache_eligible else 0)
output_ratio = (self.total_output_tokens / self.total_input_tokens
if self.total_input_tokens else 0)
error_pct = self.errors / self.requests * 100
# Model distribution
model_dist = {
model: f"{count / self.requests * 100:.1f}%"
for model, count in self.model_counts.items()
}
# P95 cost (95th percentile)
sorted_costs = sorted(self.costs_list)
p95_idx = int(len(sorted_costs) * 0.95)
p95_cost = sorted_costs[p95_idx] if sorted_costs else 0
return {
"1_cost_per_request": {
"average": f"${avg_cost:.4f}",
"p95": f"${p95_cost:.4f}",
"total": f"${self.total_cost:.2f}",
},
"2_tokens_per_dollar": {
"value": f"{tokens_per_dollar:,.0f}",
"benchmark": "Higher is better (Haiku ~200K, Opus ~40K)",
},
"3_cache_hit_rate": {
"rate": f"{cache_rate:.1f}%",
"target": "Above 60% for repeated prompts",
"eligible_requests": self.cache_eligible,
},
"4_model_distribution": model_dist,
"5_output_input_ratio": {
"ratio": f"{output_ratio:.2f}",
"note": "Low ratio (<0.1) may indicate over-prompting",
},
"6_error_rate_cost": {
"error_rate": f"{error_pct:.1f}%",
"wasted_cost": f"${self.error_cost:.2f}",
},
"7_request_volume": {
"total": self.requests,
"per_day_estimate": self.requests, # adjust for time window
},
}
def print_report(self):
"""Pretty-print the metrics report."""
report = self.report()
print("=" * 50)
print("CLAUDE API METRICS REPORT")
print("=" * 50)
for key, value in report.items():
metric_name = key.split("_", 1)[1].replace("_", " ").title()
print(f"\n{metric_name}:")
if isinstance(value, dict):
for k, v in value.items():
print(f" {k}: {v}")
else:
print(f" {value}")
# Usage
collector = MetricsCollector()
client = anthropic.Anthropic()
# Simulate collecting metrics from API calls
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Summarize this document"}]
)
collector.record("claude-sonnet-4-6", response.usage)
collector.print_report()
The Tradeoffs
Collecting all seven metrics requires instrumenting every API call path, which takes engineering time. Start with the three highest-impact metrics: cost per request, cache hit rate, and model distribution. Add the others incrementally. Over-optimizing on one metric can hurt others – maximizing cache hit rate by extending TTL to 1 hour means paying 2x base input for cache writes ($6.00/MTok instead of $3.75/MTok on Sonnet 4.6), which only pays off after 2+ cache reads per cache entry. Track metrics holistically, not in isolation.
Implementation Checklist
- Instrument all API call sites with the MetricsCollector
- Set up automated daily reports delivered to Slack or email
- Establish baselines for each metric in the first week
- Set targets for the three most impactful metrics (cost/request, cache rate, model mix)
- Review metrics weekly in team standup
- Create action items when any metric deviates 20% from target
- Build a monthly trend dashboard showing all seven metrics over time
Measuring Impact
Compare your month-1 baselines to month-3 actuals. The metrics themselves cost nothing to collect – value comes from the optimizations they drive. Typical improvement trajectory: 10% cost reduction in month 1 (model routing), 20% in month 2 (caching), 30-40% by month 3 (all seven metrics optimized). At $5,000/month baseline, that’s $1,500-$2,000/month in savings by month 3, or $18,000-$24,000 annually.