How to Choose the Cheapest Claude Model
Running all API traffic through Claude Opus 4.7 at $5.00/$25.00 per million tokens costs $75,000/month at enterprise scale. Routing 70% of simple tasks to Haiku 4.5 at $1.00/$5.00 per million tokens and keeping 30% on Opus drops that to $33,000/month — a $42,000/month reduction.
The Setup
Most production Claude deployments handle a mix of task complexities. A customer support pipeline might classify tickets (simple), draft responses (moderate), and analyze escalation patterns (complex). Using one model for all three wastes money on the simple tasks without improving results.
The decision framework is straightforward: identify the cheapest model that meets your quality threshold for each task type. This guide provides a task-to-model mapping backed by cost calculations and a scoring methodology you can apply to your own workload.
The Math
Enterprise scenario: 1,000,000 API requests/month, typical 5K input + 2K output tokens.
All Opus 4.7:
- Input: 5B tokens * $5.00/MTok = $25,000
- Output: 2B tokens * $25.00/MTok = $50,000
- Total: $75,000/month
70/30 routing (700K Haiku, 300K Opus):
- Haiku input: 3.5B * $1.00/MTok = $3,500
- Haiku output: 1.4B * $5.00/MTok = $7,000
- Opus input: 1.5B * $5.00/MTok = $7,500
- Opus output: 600M * $25.00/MTok = $15,000
- Total: $33,000/month
Savings: $42,000/month ($504,000/year)
Even a conservative 50/50 split saves $30,000/month.
The Technique
Build a task classification system that scores each request and routes it to the appropriate model tier.
import anthropic
from enum import Enum
client = anthropic.Anthropic()
class ModelTier(Enum):
HAIKU = "claude-haiku-4-5-20251001" # $1/$5 per MTok
SONNET = "claude-sonnet-4-6" # $3/$15 per MTok
OPUS = "claude-opus-4-7" # $5/$25 per MTok
# Task-to-model mapping based on quality requirements
TASK_MODEL_MAP = {
# Haiku tier: structured output, low ambiguity
"sentiment_analysis": ModelTier.HAIKU,
"language_detection": ModelTier.HAIKU,
"entity_extraction": ModelTier.HAIKU,
"format_conversion": ModelTier.HAIKU,
"spam_detection": ModelTier.HAIKU,
"pii_detection": ModelTier.HAIKU,
"keyword_extraction": ModelTier.HAIKU,
# Sonnet tier: moderate reasoning, good writing
"summarization": ModelTier.SONNET,
"email_drafting": ModelTier.SONNET,
"code_explanation": ModelTier.SONNET,
"data_analysis": ModelTier.SONNET,
"content_editing": ModelTier.SONNET,
# Opus tier: complex reasoning, critical output
"code_generation": ModelTier.OPUS,
"architecture_review": ModelTier.OPUS,
"legal_analysis": ModelTier.OPUS,
"research_synthesis": ModelTier.OPUS,
"creative_writing": ModelTier.OPUS,
}
def select_model(task_type: str, quality_override: bool = False) -> str:
"""Select the cheapest model for a task type.
Args:
task_type: The classified task type.
quality_override: Force Opus for critical requests.
"""
if quality_override:
return ModelTier.OPUS.value
tier = TASK_MODEL_MAP.get(task_type, ModelTier.SONNET)
return tier.value
def process_request(task_type: str, prompt: str, quality_override: bool = False) -> dict:
model = select_model(task_type, quality_override)
response = client.messages.create(
model=model,
max_tokens=2048,
messages=[{"role": "user", "content": prompt}],
)
return {
"model": model,
"result": response.content[0].text,
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
}
To validate your routing decisions, run a quality comparison batch:
#!/bin/bash
# Quick cost comparison across models for a test prompt
curl -s https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "content-type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-haiku-4-5-20251001",
"max_tokens": 100,
"messages": [{"role": "user", "content": "Classify as POSITIVE, NEGATIVE, or NEUTRAL: Great product, fast shipping."}]
}' | python3 -c "
import sys, json
r = json.load(sys.stdin)
inp = r['usage']['input_tokens']
out = r['usage']['output_tokens']
cost = (inp * 1 + out * 5) / 1_000_000
print(f'Haiku - Input: {inp}, Output: {out}, Cost: \${cost:.6f}')
"
The Tradeoffs
Task classification itself adds latency and complexity. For low-volume applications (under 100 requests/day), the engineering effort of building and maintaining a model router may exceed the cost savings.
Haiku 4.5 caps at 200K context tokens. Workloads requiring more than 200K input tokens per request are limited to Sonnet 4.6 or Opus 4.7, both of which support 1M context windows.
Edge cases between task categories will cause misrouting. Build in a feedback loop: if users flag low-quality responses, automatically upgrade that task category to the next model tier.
Batch processing stacks with model routing for maximum savings. Haiku 4.5 batch pricing is $0.50/$2.50 per MTok (input/output), compared to standard Opus at $5.00/$25.00. That is a 10x cost difference. A pipeline processing 100,000 classification requests per day at 2K input and 500 output tokens each costs $125/day on batch Haiku versus $1,250/day on standard Opus. The annual difference is $410,625. Even moving from standard Haiku ($1.00/$5.00) to batch Haiku saves $37,500/year at that volume. Apply batch mode to every non-real-time task category in your router for the deepest savings.
Implementation Checklist
- Export one week of API logs with request metadata
- Classify each request into a task category
- Map categories to model tiers using the framework above
- Run 50+ A/B comparisons per category between current and proposed model
- Deploy routing for the category with highest volume and clearest quality match
- Expand to additional categories after one week of monitoring
- Review routing assignments monthly as models improve
- Set up automated quality regression alerts that trigger when a routed category’s accuracy drops below its baseline threshold
Measuring Impact
Set up a dashboard tracking: requests by model (pie chart), cost by model (stacked bar), quality scores by task category (line chart over time). The primary success metric is total monthly spend. Secondary metrics include mean quality score per task category and the percentage of requests requiring quality override to Opus. Aim for less than 5% override rate — higher than that suggests your task classification needs refinement.
Related Guides
- Why Is Claude Code Expensive — understanding the root causes of high Claude API bills
- Claude Code Token Usage Optimization — reduce tokens consumed regardless of model choice
- Claude Code Monthly Cost Breakdown — benchmarks for realistic Claude spending