Anthropic’s rate limits scale with your usage tier. As you spend more, your limits automatically increase. Tier RPM TPM (Input) TPM (Output) Requirement Free 5 20,000 4,000 Sign up Build (Tier 1)...

Fix Claude Rate Exceeded Error (2026)

Last updated: April 20, 2026

The Error

Error 429: Rate Limit Exceeded
{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Number of request tokens has exceeded your per-minute rate limit"
  }
}

A rate exceeded error means you have hit one of Anthropic’s usage limits. Your API key is valid and the service is operational, but you have sent too many requests or too many tokens within a time window. This is the single most common error developers encounter when scaling Claude API usage.

Error Pattern Matcher

Paste your error message below to identify which rate limit you hit.

The 5 Types of Rate Limits

Anthropic enforces five distinct rate limits. Each has different triggers, different headers, and different solutions.

1. Requests Per Minute (RPM)

The simplest limit: too many API calls in a 60-second window, regardless of their size.

How to detect it:

# Check your current RPM allocation
curl -s -D - https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-20250514","max_tokens":10,"messages":[{"role":"user","content":"test"}]}' \
  2>&1 | grep -i "anthropic-ratelimit-requests"

Look for these headers in the response:

anthropic-ratelimit-requests-limit – your RPM cap
anthropic-ratelimit-requests-remaining – requests left this window
anthropic-ratelimit-requests-reset – when the window resets (ISO 8601)

Fix: Add request queuing with a rate limiter:

import time
import threading
class RateLimiter:
    def __init__(self, max_rpm: int):
        self.max_rpm = max_rpm
        self.tokens = max_rpm
        self.lock = threading.Lock()
        self.last_refill = time.monotonic()
    def acquire(self):
        while True:
            with self.lock:
                now = time.monotonic()
                elapsed = now - self.last_refill
                self.tokens = min(
                    self.max_rpm,
                    self.tokens + elapsed * (self.max_rpm / 60)
                )
                self.last_refill = now
                if self.tokens >= 1:
                    self.tokens -= 1
                    return
            time.sleep(0.1)
# Usage
limiter = RateLimiter(max_rpm=50)  # Adjust to your tier
for prompt in prompts:
    limiter.acquire()
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )

2. Tokens Per Minute (TPM)

You have sent too many input or output tokens within a 60-second window. Large prompts burn through this limit fast.

How to detect it:

The error message will specifically mention tokens:

"Number of request tokens has exceeded your per-minute rate limit"

Response headers:

anthropic-ratelimit-tokens-limit
anthropic-ratelimit-tokens-remaining
anthropic-ratelimit-tokens-reset

Fix: Estimate token usage before sending and throttle accordingly:

import anthropic
client = anthropic.Anthropic()
def estimate_and_send(messages, model="claude-sonnet-4-20250514", max_tokens=1024):
    # Count input tokens first
    count = client.count_tokens(model=model, messages=messages)
    total_estimated = count.input_tokens + max_tokens
    print(f"Estimated total tokens: {total_estimated}")
    # If this would exceed your budget, wait or split
    if total_estimated > 40000:
        print("Large request — consider splitting")
    return client.messages.create(
        model=model,
        max_tokens=max_tokens,
        messages=messages
    )

3. Tokens Per Day (TPD)

A daily cap on total token usage. Once exhausted, all requests fail until the 24-hour window resets.

How to detect it:

"You have exceeded your daily token limit"

Fix: Implement daily usage tracking. See the detailed guide on token-per-day limits.

import json
from pathlib import Path
from datetime import date
USAGE_FILE = Path("token_usage.json")
def track_usage(input_tokens: int, output_tokens: int):
    today = str(date.today())
    usage = json.loads(USAGE_FILE.read_text()) if USAGE_FILE.exists() else {}
    day_usage = usage.get(today, {"input": 0, "output": 0})
    day_usage["input"] += input_tokens
    day_usage["output"] += output_tokens
    usage[today] = day_usage
    USAGE_FILE.write_text(json.dumps(usage, indent=2))
    return day_usage
def check_budget(daily_limit: int = 1_000_000) -> bool:
    today = str(date.today())
    usage = json.loads(USAGE_FILE.read_text()) if USAGE_FILE.exists() else {}
    day_usage = usage.get(today, {"input": 0, "output": 0})
    total = day_usage["input"] + day_usage["output"]
    return total < daily_limit

4. Concurrent Requests

Too many in-flight requests at the same time. This limit prevents any single API key from monopolizing server resources.

How to detect it:

"Too many concurrent requests"

Fix: Use a semaphore to cap parallelism:

import asyncio
import anthropic
MAX_CONCURRENT = 5  # Adjust to your tier

async def process_batch(prompts: list[str]):
    client = anthropic.AsyncAnthropic()
    semaphore = asyncio.Semaphore(MAX_CONCURRENT)
    async def send_one(prompt: str):
        async with semaphore:
            return await client.messages.create(
                model="claude-sonnet-4-20250514",
                max_tokens=1024,
                messages=[{"role": "user", "content": prompt}]
            )
    tasks = [send_one(p) for p in prompts]
    return await asyncio.gather(*tasks)

TypeScript equivalent:

import Anthropic from "@anthropic-ai/sdk";
import pLimit from "p-limit";
const client = new Anthropic();
const limit = pLimit(5); // Max 5 concurrent
async function processBatch(prompts: string[]) {
  const tasks = prompts.map((prompt) =>
    limit(() =>
      client.messages.create({
        model: "claude-sonnet-4-20250514",
        max_tokens: 1024,
        messages: [{ role: "user", content: prompt }],
      })
    )
  );
  return Promise.all(tasks);
}

5. Monthly Spend Limits

Anthropic enforces per-organization spending caps. When reached, all API calls fail until the next billing cycle or until you request a limit increase.

How to detect it:

"Monthly spend limit reached"

Fix: Monitor spending in the Anthropic Console under Settings > Billing. Set up alerts at 50%, 75%, and 90% thresholds. Request a limit increase through the Console if your usage justifies it.

Rate Limit Tiers

Anthropic’s rate limits scale with your usage tier. As you spend more, your limits automatically increase.

Tier	RPM	TPM (Input)	TPM (Output)	Requirement
Free	5	20,000	4,000	Sign up
Build (Tier 1)	50	40,000	8,000	$5 spend
Build (Tier 2)	1,000	80,000	16,000	$40 spend
Build (Tier 3)	2,000	160,000	32,000	$200 spend
Build (Tier 4)	4,000	400,000	80,000	$1,000 spend
Scale	Custom	Custom	Custom	Contact sales
Enterprise	Custom	Custom	Custom	Annual contract

These numbers are for claude-sonnet-4-20250514. Limits vary by model – Haiku typically has higher RPM allowances, and Opus has lower ones.

Check your current tier:

# Your tier is visible in the Anthropic Console
# Direct API check: look at the rate limit headers from any successful response
curl -s -D - https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-20250514","max_tokens":10,"messages":[{"role":"user","content":"test"}]}' \
  2>&1 | grep "anthropic-ratelimit"

Rate Limit Tiers by Plan: Complete Reference

Anthropic enforces different rate limits depending on your subscription plan (Claude.ai) or your API spending tier. This table consolidates every known limit across all access methods.

Claude.ai and Claude Code Subscription Plans

Plan	Monthly Cost	Claude Code Access	Messages/Day (Estimate)	Throttle Behavior
Free	$0	No	~30 messages	Hard cutoff, resets daily
Pro	$20	Limited	~100-150 messages	Soft throttle, slower responses after limit
Max 5x	$100	Full (heavy)	~500+ messages	5x Pro rate, extended thinking included
Max 20x	$200	Full (very heavy)	~2,000+ messages	20x Pro rate, priority during peak
Team	$30/user	Full (team)	~150/user	Per-seat limits, admin controls
Enterprise	Custom	Full (managed)	Custom	SLA-backed, dedicated capacity

Note: Claude.ai message limits are approximate and vary by model and message complexity. Anthropic does not publish exact per-message quotas for subscription plans.

API Tier Limits (Detailed)

Tier	Unlock Spend	RPM	Input TPM	Output TPM	Max Concurrent	Daily Token Cap
Free	$0	5	20,000	4,000	1	300,000
Tier 1	$5	50	40,000	8,000	5	1,000,000
Tier 2	$40	1,000	80,000	16,000	10	2,500,000
Tier 3	$200	2,000	160,000	32,000	20	5,000,000
Tier 4	$1,000	4,000	400,000	80,000	50	Unlimited
Scale	Contact sales	Custom	Custom	Custom	Custom	Custom

These limits apply to claude-sonnet-4-20250514. Different models have different limits:

Model	RPM Modifier	TPM Modifier
Claude Opus 4	0.5x (half the RPM)	0.5x
Claude Sonnet 4	1x (baseline)	1x
Claude Haiku 4.5	2x (double the RPM)	2x

So if your tier allows 1,000 RPM for Sonnet, you get ~500 RPM for Opus and ~2,000 RPM for Haiku.

How to Check Your Current Tier and Remaining Quota

Method 1: Anthropic Console Log into console.anthropic.com, go to Settings > Plans. Your current tier and usage are displayed.

Method 2: API Response Headers Every successful API response includes your current limits and remaining quota:

curl -s -D - https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-20250514","max_tokens":5,"messages":[{"role":"user","content":"test"}]}' \
  2>&1 | grep "anthropic-ratelimit"

Output:

anthropic-ratelimit-requests-limit: 1000
anthropic-ratelimit-requests-remaining: 998
anthropic-ratelimit-requests-reset: 2026-04-24T15:01:00Z
anthropic-ratelimit-tokens-limit: 80000
anthropic-ratelimit-tokens-remaining: 79200
anthropic-ratelimit-tokens-reset: 2026-04-24T15:01:00Z

Method 3: Programmatic Tier Detection

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=5,
    messages=[{"role": "user", "content": "test"}]
)
headers = response._response.headers
rpm_limit = int(headers.get("anthropic-ratelimit-requests-limit", 0))
tier_map = {5: "Free", 50: "Tier 1", 1000: "Tier 2", 2000: "Tier 3", 4000: "Tier 4"}
current_tier = tier_map.get(rpm_limit, f"Custom ({rpm_limit} RPM)")
remaining_rpm = headers.get("anthropic-ratelimit-requests-remaining")
reset_time = headers.get("anthropic-ratelimit-requests-reset")
print(f"Current tier: {current_tier}")
print(f"RPM remaining: {remaining_rpm}")
print(f"Resets at: {reset_time}")

When Rate Limits Reset

Limit Type	Reset Window	Reset Behavior
RPM (Requests/Min)	60 seconds	Rolling window from your first request in the period
TPM (Tokens/Min)	60 seconds	Rolling window, input + output tokens combined
TPD (Tokens/Day)	~24 hours	Resets approximately 24 hours from your first request of the day
Concurrent	Immediate	Frees up as soon as an in-flight request completes
Monthly Spend	Billing cycle	Resets on your billing renewal date

The per-minute windows are rolling, not fixed. This means sending 50 requests at 2:00:00 PM does not guarantee 50 more requests at 2:01:00 PM. The window tracks the trailing 60 seconds continuously. Space requests evenly to avoid bursting against the limit.

Rate Limit Response Headers

Every successful API response includes rate limit headers. Parse them to build proactive throttling:

import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    max_tokens=100,
    messages=[{"role": "user", "content": "Hello"}]
)
# Access headers from the raw response
# The Python SDK exposes these on the response object
print(f"Requests remaining: {response._response.headers.get('anthropic-ratelimit-requests-remaining')}")
print(f"Tokens remaining: {response._response.headers.get('anthropic-ratelimit-tokens-remaining')}")
print(f"Resets at: {response._response.headers.get('anthropic-ratelimit-tokens-reset')}")

When a 429 response arrives, the retry-after header tells you exactly how many seconds to wait:

import anthropic
import time
try:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )
except anthropic.RateLimitError as e:
    retry_after = int(e.response.headers.get("retry-after", 60))
    print(f"Rate limited. Waiting {retry_after} seconds.")
    time.sleep(retry_after)

Exponential Backoff Implementation

The correct retry strategy for 429 errors uses exponential backoff with jitter and respects the retry-after header.

Python

import anthropic
import time
import random
def call_with_backoff(client, max_retries=8, **kwargs):
    for attempt in range(max_retries):
        try:
            return client.messages.create(**kwargs)
        except anthropic.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Prefer retry-after header if available
            retry_after = e.response.headers.get("retry-after")
            if retry_after:
                delay = int(retry_after)
            else:
                delay = min(
                    (2 ** attempt) + random.uniform(0, 1),
                    120  # Cap at 2 minutes
                )
            print(f"Rate limited (attempt {attempt + 1}), "
                  f"waiting {delay:.1f}s")
            time.sleep(delay)
    raise RuntimeError("Max retries exhausted")

TypeScript

import Anthropic from "@anthropic-ai/sdk";
async function callWithBackoff(
  client: Anthropic,
  params: Anthropic.MessageCreateParams,
  maxRetries = 8
): Promise<Anthropic.Message> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await client.messages.create(params);
    } catch (err) {
      if (
        err instanceof Anthropic.RateLimitError &&
        attempt < maxRetries - 1
      ) {
        const retryAfter = err.headers?.get("retry-after");
        const delay = retryAfter
          ? parseInt(retryAfter, 10) * 1000
          : Math.min(Math.pow(2, attempt) * 1000 + Math.random() * 1000, 120000);
        console.log(
          `Rate limited (attempt ${attempt + 1}), waiting ${(delay / 1000).toFixed(1)}s`
        );
        await new Promise((r) => setTimeout(r, delay));
        continue;
      }
      throw err;
    }
  }
  throw new Error("Max retries exhausted");
}

Token Budgeting Strategies

Proactive token management prevents rate limit errors entirely.

Pre-flight Token Estimation

import anthropic
client = anthropic.Anthropic()
def budget_aware_call(messages, max_tokens=1024, budget_remaining=40000):
    """Only send if we have budget remaining in this window."""
    count = client.count_tokens(
        model="claude-sonnet-4-20250514",
        messages=messages
    )
    estimated_total = count.input_tokens + max_tokens
    if estimated_total > budget_remaining:
        raise ValueError(
            f"Request needs ~{estimated_total} tokens, "
            f"but only {budget_remaining} remain in this window"
        )
    return client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=max_tokens,
        messages=messages
    )

Sliding Window Token Tracker

import time
from collections import deque
class TokenWindow:
    def __init__(self, tpm_limit: int, window_seconds: int = 60):
        self.tpm_limit = tpm_limit
        self.window = window_seconds
        self.usage = deque()  # (timestamp, tokens)

    def record(self, tokens: int):
        self.usage.append((time.monotonic(), tokens))
        self._prune()
    def available(self) -> int:
        self._prune()
        used = sum(t for _, t in self.usage)
        return max(0, self.tpm_limit - used)
    def _prune(self):
        cutoff = time.monotonic() - self.window
        while self.usage and self.usage[0][0] < cutoff:
            self.usage.popleft()
# Usage
window = TokenWindow(tpm_limit=80000)
for prompt in prompts:
    estimated = estimate_tokens(prompt)
    if window.available() < estimated:
        wait_time = 60 - (time.monotonic() - window.usage[0][0])
        time.sleep(max(0, wait_time))
    response = client.messages.create(...)
    total_tokens = response.usage.input_tokens + response.usage.output_tokens
    window.record(total_tokens)

Circuit Breaker Pattern for Production Systems

Simple retry logic is not enough for production applications. When the API is persistently rate-limited, retrying every request wastes time and compute. A circuit breaker stops sending requests entirely when failure rates are high, then gradually resumes.

import time
import threading
from enum import Enum
class CircuitState(Enum):
    CLOSED = "closed"        # Normal operation
    OPEN = "open"            # Blocking all requests
    HALF_OPEN = "half_open"  # Testing if API recovered

class CircuitBreaker:
    def __init__(self, failure_threshold=5, recovery_timeout=60):
        self.failure_threshold = failure_threshold
        self.recovery_timeout = recovery_timeout
        self.state = CircuitState.CLOSED
        self.failure_count = 0
        self.last_failure_time = 0
        self.lock = threading.Lock()
    def can_execute(self) -> bool:
        with self.lock:
            if self.state == CircuitState.CLOSED:
                return True
            if self.state == CircuitState.OPEN:
                if time.monotonic() - self.last_failure_time > self.recovery_timeout:
                    self.state = CircuitState.HALF_OPEN
                    return True
                return False
            # HALF_OPEN: allow one test request
            return True
    def record_success(self):
        with self.lock:
            self.failure_count = 0
            self.state = CircuitState.CLOSED
    def record_failure(self):
        with self.lock:
            self.failure_count += 1
            self.last_failure_time = time.monotonic()
            if self.failure_count >= self.failure_threshold:
                self.state = CircuitState.OPEN
# Usage with Anthropic SDK
import anthropic
client = anthropic.Anthropic()
breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=30)
def call_with_circuit_breaker(messages, **kwargs):
    if not breaker.can_execute():
        raise RuntimeError("Circuit breaker OPEN — API rate limited, waiting for recovery")
    try:
        result = client.messages.create(messages=messages, **kwargs)
        breaker.record_success()
        return result
    except anthropic.RateLimitError:
        breaker.record_failure()
        raise

The circuit breaker transitions through three states: CLOSED (normal), OPEN (all requests blocked), and HALF_OPEN (one test request allowed). This prevents your application from hammering a rate-limited API and gives the rate limit window time to reset.

These diagnostic steps are from The Claude Code Playbook — 200 production-ready templates including error prevention rules and CLAUDE.md configs tested across 50+ project types.

Cost of Rate Limiting vs Batching

When you regularly hit rate limits, consider whether the Message Batches API is more cost-effective than real-time requests with retry logic.

Approach	Cost per 1M tokens (input)	Latency	Rate limit impact
Real-time API	$3.00 (Sonnet)	1-10 seconds	Counts against RPM and TPM
Real-time + retries	$3.00 + wasted compute on failed attempts	Variable (retries add delay)	Exacerbates the problem
Message Batches API	$1.50 (Sonnet, 50% discount)	Minutes to hours	Does not count against real-time limits

For workloads that do not need immediate responses (nightly code analysis, bulk data processing, report generation), batching is strictly better: half the cost and zero rate limit pressure on your real-time quota.

For workloads that need responses within seconds (user-facing chat, interactive coding, CI pipeline steps), real-time with a circuit breaker is the right pattern.

Upgrading Rate Limits

Automatic Tier Progression

Anthropic automatically upgrades your tier based on cumulative spend. No action required – once you cross a spend threshold, your limits increase within 24 hours.

Requesting Custom Limits

For Scale and Enterprise tiers, contact Anthropic sales through the Console. Provide:

Your current usage patterns (RPM, TPM averages and peaks)
Your projected growth over the next 3 months
Your use case (batch processing, real-time chat, agent workflows)

Batching as a Rate Limit Bypass

The Message Batches API processes requests asynchronously at 50% cost and does not count against your real-time rate limits:

# Batch API for non-urgent workloads
batch = client.batches.create(
    requests=[
        {
            "custom_id": f"req_{i}",
            "params": {
                "model": "claude-sonnet-4-20250514",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": prompt}]
            }
        }
        for i, prompt in enumerate(prompts)
    ]
)
print(f"Batch created: {batch.id}")

Claude Code and Rate Limits

Claude Code manages rate limits internally. When it hits a 429, it automatically waits and retries. You will see a message like:

Waiting for rate limit to reset (retrying in 32s)...

To reduce how often this happens:

Use /compact frequently. Long conversations accumulate tokens. Compacting summarizes history and reduces the token payload on each subsequent request.
Avoid unnecessary file reads. Each file Claude Code reads adds to the input token count. Be specific about which files you want it to examine.
Set max_tokens in CLAUDE.md. This does not directly control Claude Code’s behavior but signals intent for any API wrappers.
Switch models for simple tasks. Use claude --model claude-haiku-4-20250514 for tasks that do not need deep reasoning. Haiku has higher rate limits and costs fewer tokens.

See the 429 retry-after guide for Claude Code specific troubleshooting.

When “Rate Exceeded” Is Actually Overload (529)

If you receive a 529 error instead of 429, the issue is not your rate limit – it is Anthropic’s capacity. The 529 means the model is saturated across all users. Your retry strategy should be longer delays (start at 30 seconds) and you should consider falling back to a smaller model.

The key difference:

429: You specifically have exceeded your allocation. Other users may be fine.
529: The entire model is overloaded. All users are affected.

CLAUDE.md Rules to Reduce Token Usage

# Token Efficiency
- Always use the smallest model that can handle the task
- Prefer targeted file reads over directory-wide searches
- Summarize long outputs before including them in follow-up prompts
- Use structured output (JSON) to reduce response verbosity
- Batch related questions into single prompts instead of multiple calls
- Set explicit max_tokens limits in every API call
- Use system prompts for persistent instructions (avoids repeating in user messages)

FAQ

How do I check my current rate limit tier?

Log into the Anthropic Console. Navigate to Settings > Plans to see your current tier. Alternatively, inspect the anthropic-ratelimit-requests-limit header from any successful API response.

Do rate limits apply per API key or per organization?

Rate limits apply at the organization level. Multiple API keys within the same organization share the same rate limit pool. Creating additional API keys does not increase your limits.

Does the Anthropic Python SDK handle rate limits automatically?

The official SDK includes automatic retry logic with exponential backoff for 429 responses. However, the defaults may not suit your use case. For high-volume applications, implement custom retry logic as shown in the examples above.

How do I handle rate limits in a multi-user application?

Implement request queuing at the application level. Use a shared rate limiter (Redis-backed for distributed systems) that tracks usage across all users and delays requests that would exceed the limit.

Can I get higher rate limits without paying more?

No. Rate limits are tied to your spending tier. The only ways to increase them are to spend more (triggering automatic tier upgrades) or to contact Anthropic sales for a custom enterprise agreement.

What happens if I hit the daily token limit?

All API requests will return 429 until the 24-hour window resets. The reset time is based on when your first request of the day was made, not midnight UTC. Plan your workloads to spread token usage across the full day.

Are rate limits different for streaming vs non-streaming requests?

No. Streaming and non-streaming requests count equally against RPM and TPM limits. Streaming does not provide any rate limit advantage, though it can improve perceived latency.

How do rate limits work with tool use and function calling?

Tool definitions count as input tokens. Each tool in your request schema adds tokens to the input count. If you define 20 tools, you may be sending 5,000-10,000 extra tokens per request. Prune unused tools to stay within TPM limits. See the tool token cost guide for details.

Does prompt caching help with rate limits?

Yes. Prompt caching reduces the number of input tokens counted against your TPM limit for cached content. When your system prompt and conversation history are cached, subsequent requests consume fewer tokens from your per-minute allowance, effectively increasing your throughput without upgrading your tier.

Can I use multiple API keys to bypass rate limits?

No. Rate limits are enforced at the organization level, not per API key. Creating additional keys within the same organization shares the same limit pool. The correct approach is to upgrade your tier through increased spend or contact Anthropic sales for custom limits.

Estimate usage → Calculate your token consumption with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Fix Claude Code Rate Limit 429 Retry-After
Fix Claude Code Tokens Per Day Limit
Fix Claude API 529 Overloaded Error
Fix Claude API 503 Service Unavailable
Fix Claude Internal Server Error
The Claude Code Playbook
Fix Claude Code ETIMEOUT Corporate Proxy
Fix Claude Code Docker Cannot Reach API Endpoint
Fix Claude AI Rate Exceeded Error
Claude not working right now fix — troubleshoot when Claude is down
Claude 5-hour usage limit — understand rolling usage windows
Claude extra usage cost — what overages actually cost

The Error

Error Pattern Matcher

The 5 Types of Rate Limits

1. Requests Per Minute (RPM)

2. Tokens Per Minute (TPM)

3. Tokens Per Day (TPD)

4. Concurrent Requests

5. Monthly Spend Limits

Rate Limit Tiers

Rate Limit Tiers by Plan: Complete Reference

Claude.ai and Claude Code Subscription Plans

API Tier Limits (Detailed)

How to Check Your Current Tier and Remaining Quota

When Rate Limits Reset

Rate Limit Response Headers

Exponential Backoff Implementation

Python

TypeScript

Token Budgeting Strategies

Pre-flight Token Estimation

Sliding Window Token Tracker

Circuit Breaker Pattern for Production Systems

Cost of Rate Limiting vs Batching

Upgrading Rate Limits

Automatic Tier Progression

Requesting Custom Limits

Batching as a Rate Limit Bypass

Claude Code and Rate Limits

When “Rate Exceeded” Is Actually Overload (529)

CLAUDE.md Rules to Reduce Token Usage

FAQ

How do I check my current rate limit tier?

Do rate limits apply per API key or per organization?

Does the Anthropic Python SDK handle rate limits automatically?

How do I handle rate limits in a multi-user application?

Can I get higher rate limits without paying more?

What happens if I hit the daily token limit?

Are rate limits different for streaming vs non-streaming requests?

How do rate limits work with tool use and function calling?

Does prompt caching help with rate limits?

Can I use multiple API keys to bypass rate limits?

Related Guides

About the Author

Related Guides