Batch API Cost Calculator for Claude Models
Batch pricing on Claude is straightforward: divide the standard price by two. Opus 4.7 input goes from $5.00 to $2.50 per million tokens. Output goes from $25.00 to $12.50. But calculating the actual cost for a specific workload requires knowing your token distribution, and most teams overestimate input tokens while underestimating output tokens.
The Setup
You are planning to migrate three workloads to the Batch API and need precise cost projections before committing. Workload A is a summarization task (high input, low output). Workload B is content generation (moderate input, high output). Workload C is classification (low input, minimal output).
Each workload has different cost characteristics, and the model choice matters. Running workload B on Opus 4.7 batch costs $15.00 per 1,000 requests. Running it on Haiku 4.5 batch costs $1.50 per 1,000 requests. The calculator below handles all permutations.
The Math
Batch pricing per million tokens (all verified):
| Model | Batch Input | Batch Output | Standard Input | Standard Output |
|---|---|---|---|---|
| Opus 4.7 | $2.50 | $12.50 | $5.00 | $25.00 |
| Sonnet 4.6 | $1.50 | $7.50 | $3.00 | $15.00 |
| Haiku 4.5 | $0.50 | $2.50 | $1.00 | $5.00 |
| Opus 4.1 | $7.50 | $37.50 | $15.00 | $75.00 |
Three workload examples (1,000 requests each):
Workload A (summarization): 10K input + 500 output per request
- Opus 4.7 batch: $25.00 + $6.25 = $31.25
- Haiku 4.5 batch: $5.00 + $1.25 = $6.25
Workload B (content gen): 5K input + 3K output per request
- Opus 4.7 batch: $12.50 + $37.50 = $50.00
- Haiku 4.5 batch: $2.50 + $7.50 = $10.00
Workload C (classification): 2K input + 200 output per request
- Opus 4.7 batch: $5.00 + $2.50 = $7.50
- Haiku 4.5 batch: $1.00 + $0.50 = $1.50
The Technique
Here is a comprehensive cost calculator that handles any model, token count, and batch size:
from dataclasses import dataclass
@dataclass
class BatchCostResult:
model: str
requests: int
total_input_tokens: int
total_output_tokens: int
batch_input_cost: float
batch_output_cost: float
batch_total: float
standard_total: float
savings: float
savings_pct: float
cost_per_request: float
def __str__(self) -> str:
return (
f"Model: {self.model}\n"
f"Requests: {self.requests:,}\n"
f"Batch cost: ${self.batch_total:.2f} "
f"(in: ${self.batch_input_cost:.2f}, "
f"out: ${self.batch_output_cost:.2f})\n"
f"Standard cost: ${self.standard_total:.2f}\n"
f"Savings: ${self.savings:.2f} ({self.savings_pct:.1f}%)\n"
f"Per request: ${self.cost_per_request:.4f}"
)
BATCH_PRICES = {
"opus-4.7": {"in": 2.50, "out": 12.50, "std_in": 5.00, "std_out": 25.00},
"sonnet-4.6": {"in": 1.50, "out": 7.50, "std_in": 3.00, "std_out": 15.00},
"haiku-4.5": {"in": 0.50, "out": 2.50, "std_in": 1.00, "std_out": 5.00},
"opus-4.1": {"in": 7.50, "out": 37.50, "std_in": 15.00, "std_out": 75.00},
}
def calculate_batch_cost(
model: str,
num_requests: int,
avg_input_tokens: int,
avg_output_tokens: int
) -> BatchCostResult:
"""Calculate batch vs standard API cost."""
if model not in BATCH_PRICES:
raise ValueError(f"Unknown model: {model}. "
f"Options: {list(BATCH_PRICES.keys())}")
p = BATCH_PRICES[model]
total_in = num_requests * avg_input_tokens
total_out = num_requests * avg_output_tokens
batch_in_cost = total_in * p["in"] / 1e6
batch_out_cost = total_out * p["out"] / 1e6
batch_total = batch_in_cost + batch_out_cost
std_in_cost = total_in * p["std_in"] / 1e6
std_out_cost = total_out * p["std_out"] / 1e6
std_total = std_in_cost + std_out_cost
savings = std_total - batch_total
return BatchCostResult(
model=model,
requests=num_requests,
total_input_tokens=total_in,
total_output_tokens=total_out,
batch_input_cost=batch_in_cost,
batch_output_cost=batch_out_cost,
batch_total=batch_total,
standard_total=std_total,
savings=savings,
savings_pct=(savings / std_total * 100) if std_total > 0 else 0,
cost_per_request=batch_total / num_requests if num_requests > 0 else 0
)
# Compare all models for the same workload
for model in BATCH_PRICES:
result = calculate_batch_cost(model, 10000, 5000, 3000)
print(result)
print()
For a quick command-line calculation:
# Quick batch cost estimator
python3 -c "
import sys
models = {
'opus-4.7': (2.50, 12.50),
'sonnet-4.6': (1.50, 7.50),
'haiku-4.5': (0.50, 2.50),
}
reqs = 10000
inp = 5000
out = 3000
print(f'Batch cost for {reqs:,} requests ({inp:,} in + {out:,} out tokens each):')
print(f'{\"\":-^60}')
for name, (pin, pout) in models.items():
cost_in = reqs * inp * pin / 1e6
cost_out = reqs * out * pout / 1e6
total = cost_in + cost_out
per_req = total / reqs
print(f'{name:>12}: \${total:>10,.2f} total (\${per_req:.4f}/request)')
"
Output:
Batch cost for 10,000 requests (5,000 in + 3,000 out tokens each):
------------------------------------------------------------
opus-4.7: $ 500.00 total ($0.0500/request)
sonnet-4.6: $ 300.00 total ($0.0300/request)
haiku-4.5: $ 100.00 total ($0.0100/request)
To estimate tokens from your actual data before batch submission:
def estimate_tokens_from_text(text: str) -> int:
"""Rough token estimate: ~4 chars per token for English."""
return len(text) // 4
# Sample your dataset to estimate average tokens
import json
import random
data = json.load(open("dataset.json"))
sample = random.sample(data, min(100, len(data)))
input_tokens = [estimate_tokens_from_text(item["prompt"]) for item in sample]
avg_input = sum(input_tokens) // len(input_tokens)
print(f"Estimated avg input tokens: {avg_input}")
print(f"Run calculator with avg_input_tokens={avg_input}")
The Tradeoffs
Cost calculators have inherent limitations:
- Output token estimation is hard. You know your input tokens precisely, but output tokens depend on the model’s response length. Use
max_tokensas an upper bound and actual historical data for better estimates. - Cache stacking changes the math. If you combine batch with prompt caching, the effective input rate drops to 5% of standard ($0.25/MTok on Opus 4.7 for cached batch reads). The calculator above does not account for caching – add a separate calculation layer.
- Token counting varies by model. Opus 4.7 uses a new tokenizer that may produce up to 35% more tokens for the same text compared to older models. A prompt that costs X tokens on Sonnet may cost 1.35X on Opus.
- No free tier. Unlike some competitors, Claude batch API requires API credits. There is no free batch processing tier.
Implementation Checklist
- Sample 100 representative requests from your workload
- Count actual input tokens using the Anthropic tokenizer or estimate at 4 chars/token
- Estimate output tokens from historical response lengths or
max_tokenssettings - Run the calculator for all candidate models
- Compare batch cost against your current standard API spend
- Select the model that meets quality requirements at the lowest batch cost
- Start with a small test batch (100 requests) before committing to 100K
Measuring Impact
After running your first production batch:
- Predicted vs actual cost: Compare calculator output against the batch invoice. Should match within 5% if token estimates are accurate.
- Output token variance: Compare estimated output tokens against actual. Large variance means your estimator needs calibration.
- Model quality check: If you downgraded from Opus to Sonnet for cost savings, sample 50 outputs and verify quality meets your threshold.
- Adjust future calculator inputs based on actual token usage data from completed batches