Claude API Usage Metrics Every Team Needs

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Most teams track one metric for Claude API usage: total monthly spend. That’s like monitoring a car’s fuel bill without checking MPG, tire pressure, or engine temperature. The seven metrics that actually drive cost optimization are: cost per request, tokens per dollar, cache hit rate, model distribution ratio, output-to-input ratio, error rate cost, and time-to-throttle. Together, they paint a picture that turns a $5,000/month bill into a $3,000/month bill.

The Setup

The Claude API response’s usage object provides the raw data for all seven metrics. Each response contains input_tokens, output_tokens, cache_read_input_tokens, cache_creation_input_tokens, and server_tool_use. Combined with the model identifier and your pricing table, these fields generate every metric a team needs. The challenge isn’t data availability – it’s the discipline of collecting, aggregating, and acting on the data consistently. Teams that review these metrics weekly cut costs 25-40% faster than those reviewing monthly.

The Math

How each metric translates to dollar savings:

Metric 1: Cost per request

Metric 2: Cache hit rate

Metric 3: Model distribution

The Technique

Build a metrics collector that computes all seven metrics from API responses.

import anthropic
from dataclasses import dataclass, field
from collections import defaultdict
from datetime import datetime

PRICING = {
    "claude-opus-4-7": {"input": 5.00, "output": 25.00, "cache_read": 0.50},
    "claude-sonnet-4-6": {"input": 3.00, "output": 15.00, "cache_read": 0.30},
    "claude-haiku-4-5": {"input": 1.00, "output": 5.00, "cache_read": 0.10},
}

@dataclass
class MetricsCollector:
    requests: int = 0
    total_cost: float = 0.0
    total_input_tokens: int = 0
    total_output_tokens: int = 0
    cache_eligible: int = 0
    cache_hits: int = 0
    errors: int = 0
    error_cost: float = 0.0
    model_counts: dict = field(default_factory=lambda: defaultdict(int))
    costs_list: list = field(default_factory=list)

    def record(self, model: str, usage, is_error: bool = False):
        """Record metrics from a single API response."""
        prices = PRICING.get(model, PRICING["claude-sonnet-4-6"])

        cost = (
            usage.input_tokens * prices["input"] / 1_000_000
            + usage.output_tokens * prices["output"] / 1_000_000
        )

        cache_read = getattr(usage, "cache_read_input_tokens", 0) or 0
        if cache_read > 0:
            cost += cache_read * prices["cache_read"] / 1_000_000

        self.requests += 1
        self.total_cost += cost
        self.total_input_tokens += usage.input_tokens
        self.total_output_tokens += usage.output_tokens
        self.model_counts[model] += 1
        self.costs_list.append(cost)

        # Cache tracking
        if usage.input_tokens > 5000:  # likely cache-eligible
            self.cache_eligible += 1
            if cache_read > 0:
                self.cache_hits += 1

        if is_error:
            self.errors += 1
            self.error_cost += cost

    def report(self) -> dict:
        """Generate the 7 essential metrics."""
        if self.requests == 0:
            return {"error": "No data collected"}

        avg_cost = self.total_cost / self.requests
        total_tokens = self.total_input_tokens + self.total_output_tokens
        tokens_per_dollar = total_tokens / self.total_cost if self.total_cost else 0
        cache_rate = (self.cache_hits / self.cache_eligible * 100
                      if self.cache_eligible else 0)
        output_ratio = (self.total_output_tokens / self.total_input_tokens
                        if self.total_input_tokens else 0)
        error_pct = self.errors / self.requests * 100

        # Model distribution
        model_dist = {
            model: f"{count / self.requests * 100:.1f}%"
            for model, count in self.model_counts.items()
        }

        # P95 cost (95th percentile)
        sorted_costs = sorted(self.costs_list)
        p95_idx = int(len(sorted_costs) * 0.95)
        p95_cost = sorted_costs[p95_idx] if sorted_costs else 0

        return {
            "1_cost_per_request": {
                "average": f"${avg_cost:.4f}",
                "p95": f"${p95_cost:.4f}",
                "total": f"${self.total_cost:.2f}",
            },
            "2_tokens_per_dollar": {
                "value": f"{tokens_per_dollar:,.0f}",
                "benchmark": "Higher is better (Haiku ~200K, Opus ~40K)",
            },
            "3_cache_hit_rate": {
                "rate": f"{cache_rate:.1f}%",
                "target": "Above 60% for repeated prompts",
                "eligible_requests": self.cache_eligible,
            },
            "4_model_distribution": model_dist,
            "5_output_input_ratio": {
                "ratio": f"{output_ratio:.2f}",
                "note": "Low ratio (<0.1) may indicate over-prompting",
            },
            "6_error_rate_cost": {
                "error_rate": f"{error_pct:.1f}%",
                "wasted_cost": f"${self.error_cost:.2f}",
            },
            "7_request_volume": {
                "total": self.requests,
                "per_day_estimate": self.requests,  # adjust for time window
            },
        }

    def print_report(self):
        """Pretty-print the metrics report."""
        report = self.report()
        print("=" * 50)
        print("CLAUDE API METRICS REPORT")
        print("=" * 50)
        for key, value in report.items():
            metric_name = key.split("_", 1)[1].replace("_", " ").title()
            print(f"\n{metric_name}:")
            if isinstance(value, dict):
                for k, v in value.items():
                    print(f"  {k}: {v}")
            else:
                print(f"  {value}")


# Usage
collector = MetricsCollector()
client = anthropic.Anthropic()

# Simulate collecting metrics from API calls
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Summarize this document"}]
)
collector.record("claude-sonnet-4-6", response.usage)

collector.print_report()

The Tradeoffs

Collecting all seven metrics requires instrumenting every API call path, which takes engineering time. Start with the three highest-impact metrics: cost per request, cache hit rate, and model distribution. Add the others incrementally. Over-optimizing on one metric can hurt others – maximizing cache hit rate by extending TTL to 1 hour means paying 2x base input for cache writes ($6.00/MTok instead of $3.75/MTok on Sonnet 4.6), which only pays off after 2+ cache reads per cache entry. Track metrics holistically, not in isolation.

Implementation Checklist

Measuring Impact

Compare your month-1 baselines to month-3 actuals. The metrics themselves cost nothing to collect – value comes from the optimizations they drive. Typical improvement trajectory: 10% cost reduction in month 1 (model routing), 20% in month 2 (caching), 30-40% by month 3 (all seven metrics optimized). At $5,000/month baseline, that’s $1,500-$2,000/month in savings by month 3, or $18,000-$24,000 annually.