Maximum cost per request by model and context window: Model Window Max Input Cost Typical Cost (50% fill) Haiku 4.5 200K $0.20 $0.10 Sonnet 4.6 1M $3.00 $1.50 Opus 4.7 ...

Claude 200K vs 1M Context Cost (2026)

Last updated: April 19, 2026

Filling Claude Haiku 4.5’s 200K context window costs $0.20 per request. Filling Opus 4.7’s 1M context window costs $5.00 per request — 25 times more. Most tasks that appear to need 1M tokens can be accomplished with a 200K window and targeted context selection.

The Setup

Claude Opus 4.7 and Sonnet 4.6 offer 1 million token context windows. Haiku 4.5 offers 200K tokens. The question is whether you actually need 1M tokens or whether 200K is sufficient for your workload.

Context window size determines two things: maximum capacity (hard limit) and typical cost (what you actually send). A 1M window does not mean you must fill it, but many developers do fill it — loading entire codebases, full documentation sets, or complete conversation histories because the window allows it.

This guide compares the cost implications of each context window tier and shows how to determine which window size your workload truly requires.

The Math

Maximum cost per request by model and context window:

Model	Window	Max Input Cost	Typical Cost (50% fill)
Haiku 4.5	200K	$0.20	$0.10
Sonnet 4.6	1M	$3.00	$1.50
Opus 4.7	1M	$5.00	$2.50

Monthly cost at 100 requests/day filling 50% of the context window:

Model	Monthly Cost	vs Haiku
Haiku 4.5 (100K avg)	$300	Baseline
Sonnet 4.6 (500K avg)	$4,500	15x
Opus 4.7 (500K avg)	$7,500	25x

But if you use Sonnet/Opus with targeted 50K context:

Model	Monthly Cost (50K context)	vs Haiku full
Sonnet 4.6 (50K)	$450	1.5x
Opus 4.7 (50K)	$750	2.5x

The model matters less than context discipline. Opus with 50K context ($750/month) costs less than Haiku at 200K context ($600/month, assuming frequent near-capacity usage).

The Technique

Determine Your Actual Context Requirements

import anthropic
client = anthropic.Anthropic()
def analyze_context_needs(
    messages_sample: list,
    model: str = "claude-sonnet-4-6",
) -> dict:
    """Analyze a sample of requests to determine actual context needs."""
    token_counts = []
    for msg_set in messages_sample:
        count = client.messages.count_tokens(
            model=model,
            messages=msg_set,
        )
        token_counts.append(count.input_tokens)
    token_counts.sort()
    p50 = token_counts[len(token_counts) // 2]
    p90 = token_counts[int(len(token_counts) * 0.9)]
    p99 = token_counts[int(len(token_counts) * 0.99)]
    max_tokens = max(token_counts)
    return {
        "sample_size": len(token_counts),
        "p50_tokens": p50,
        "p90_tokens": p90,
        "p99_tokens": p99,
        "max_tokens": max_tokens,
        "fits_200k": f"{sum(1 for t in token_counts if t <= 200000) / len(token_counts) * 100:.1f}%",
        "recommendation": "haiku_200k" if p99 <= 200000 else "sonnet_or_opus_1m",
    }

Model Selection Based on Context Needs

def select_by_context(
    input_tokens: int,
    task_complexity: str = "moderate",
    cost_priority: bool = True,
) -> str:
    """Select the cheapest model that fits the context requirement."""
    # Haiku: 200K max context, cheapest
    # Sonnet: 1M max context, mid-price
    # Opus: 1M max context, highest quality + price

    if input_tokens <= 200_000:
        if task_complexity == "simple" or cost_priority:
            return "claude-haiku-4-5-20251001"  # $1/$5 per MTok
        elif task_complexity == "moderate":
            return "claude-sonnet-4-6"  # $3/$15 per MTok
        else:
            return "claude-opus-4-7"  # $5/$25 per MTok
    else:
        # Must use Sonnet or Opus for >200K context
        if task_complexity == "complex":
            return "claude-opus-4-7"
        return "claude-sonnet-4-6"
# Examples
print(select_by_context(50_000, "simple"))    # -> Haiku
print(select_by_context(50_000, "complex"))   # -> Opus
print(select_by_context(300_000, "simple"))   # -> Sonnet (Haiku can't handle >200K)
print(select_by_context(300_000, "complex"))  # -> Opus

Cost-Aware Context Sizing

def fit_context_to_budget(
    documents: list,
    max_budget_per_request: float = 0.50,
    model: str = "claude-opus-4-7",
) -> list:
    """Select documents to include within a cost budget."""
    rates = {
        "claude-opus-4-7": 5.0,
        "claude-sonnet-4-6": 3.0,
        "claude-haiku-4-5-20251001": 1.0,
    }
    rate = rates[model]
    max_tokens = int(max_budget_per_request * 1_000_000 / rate)
    # Sort documents by relevance (assumed pre-scored)
    selected = []
    total_tokens = 0
    for doc in documents:
        doc_tokens = len(doc["content"].split()) * 2  # rough estimate
        if total_tokens + doc_tokens <= max_tokens:
            selected.append(doc)
            total_tokens += doc_tokens
        else:
            break
    actual_cost = total_tokens * rate / 1_000_000
    return {
        "selected_docs": len(selected),
        "total_docs": len(documents),
        "tokens": total_tokens,
        "estimated_cost": f"${actual_cost:.4f}",
        "budget": f"${max_budget_per_request:.2f}",
    }
# Example: Fit documents into a $0.50 Opus budget
docs = [
    {"content": "First relevant document... " * 500, "relevance": 0.95},
    {"content": "Second document... " * 500, "relevance": 0.90},
    {"content": "Third document... " * 500, "relevance": 0.85},
    {"content": "Fourth document... " * 5000, "relevance": 0.80},
]
result = fit_context_to_budget(docs, max_budget_per_request=0.50)
print(f"Selected {result['selected_docs']}/{result['total_docs']} docs at {result['estimated_cost']}")

The Tradeoffs

Haiku’s 200K window is sufficient for most tasks: typical code reviews (10-50K tokens), document Q&A (5-30K tokens), and conversation sessions under 20 turns. The 1M window is genuinely needed for full-codebase analysis, very long documents, and extended multi-tool sessions.

Choosing Haiku for cost savings means accepting lower capability on complex tasks. The 200K vs 1M decision is inseparable from the Haiku vs Sonnet/Opus capability decision.

If your workload splits between small-context and large-context requests, route small requests to Haiku and large requests to Sonnet. This captures savings on high-volume simple requests while maintaining capacity for occasional large-context analysis.

Implementation Checklist

Sample 100 production requests and count their actual token usage
Calculate what percentage fit within 200K tokens
Route requests under 200K to Haiku if task complexity allows
Set per-request cost budgets that limit context filling
Monitor the percentage of requests that genuinely need more than 200K
Review routing rules quarterly

Measuring Impact

Track the distribution of context sizes across all requests. If 80%+ of requests use under 100K tokens, most of your traffic can run on Haiku at $1/MTok instead of Opus at $5/MTok. Calculate the monthly cost difference between your current model assignment and an optimized routing based on actual context needs. The savings potential is proportional to the percentage of small-context requests currently running on expensive models.

Which model? → Take the 5-question quiz in our Model Selector.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Why Is Claude Code Expensive — why context is the primary cost driver
Claude Code Context Window Management Guide — practical context management
Why Does Anthropic Limit Claude Code Context Window — the design reasoning behind context limits

Claude 200K vs 1M Context Cost (2026)

The Setup

The Math

The Technique

Determine Your Actual Context Requirements

Model Selection Based on Context Needs

Cost-Aware Context Sizing

The Tradeoffs

Implementation Checklist

Measuring Impact

See Also

About the Author

The Setup

The Math

The Technique

Determine Your Actual Context Requirements

Model Selection Based on Context Needs

Cost-Aware Context Sizing

The Tradeoffs

Implementation Checklist

Measuring Impact

Related Guides

See Also

About the Author

Related Guides