Fix Claude API Rate Limit Errors (HTTP 429)
TL;DR: HTTP 429 means you have exceeded your request or token rate limit. Implement exponential backoff, check your usage tier limits, and batch requests to stay within bounds.
The Problem
Your Claude API calls start failing with HTTP 429:
{
"type": "error",
"error": {
"type": "rate_limit_error",
"message": "Rate limit exceeded. Please retry after X seconds."
}
}
Why This Happens
Anthropic enforces rate limits across three dimensions:
- Requests per minute (RPM): How many API calls you can make per minute.
- Input tokens per minute (ITPM): Total input tokens consumed per minute.
- Output tokens per minute (OTPM): Total output tokens generated per minute.
Limits vary by usage tier (Tier 1, Tier 2, Tier 3, Tier 4, and Monthly Invoicing) and by model. Opus models typically have lower limits than Sonnet or Haiku. You can view your current tier and limits in the Anthropic Console.
The Fix
Step 1 — Check Your Current Limits and Usage
# Make a request and inspect rate limit headers
curl -s -D - https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-sonnet-4-5-20250514","max_tokens":32,"messages":[{"role":"user","content":"hi"}]}' \
-o /dev/null 2>&1 | grep -i "rate\|limit\|retry"
Key response headers returned on every request:
anthropic-ratelimit-requests-limit — Your RPM cap
anthropic-ratelimit-requests-remaining — Remaining requests this window
anthropic-ratelimit-requests-reset — When the RPM window resets (ISO 8601)
anthropic-ratelimit-input-tokens-limit — Your ITPM cap
anthropic-ratelimit-input-tokens-remaining
anthropic-ratelimit-input-tokens-reset
anthropic-ratelimit-output-tokens-limit — Your OTPM cap
anthropic-ratelimit-output-tokens-remaining
anthropic-ratelimit-output-tokens-reset
retry-after — Seconds to wait before retrying (on 429 responses)
Step 2 — Implement Proper Retry Logic
Python SDK (automatic retries built in):
import anthropic
import time
client = anthropic.Anthropic(max_retries=5)
def call_with_backoff(messages, max_attempts=5):
"""Call API with manual backoff for rate limits."""
for attempt in range(max_attempts):
try:
return client.messages.create(
model="claude-sonnet-4-5-20250514",
max_tokens=1024,
messages=messages
)
except anthropic.RateLimitError as e:
if attempt == max_attempts - 1:
raise
wait = min(2 ** attempt, 60) # Cap at 60s
print(f"Rate limited. Waiting {wait}s...")
time.sleep(wait)
return None # Unreachable but satisfies bounded return
TypeScript SDK:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ maxRetries: 5 });
async function callWithBackoff(
messages: Anthropic.MessageParam[],
maxAttempts = 5
): Promise<Anthropic.Message | null> {
for (let attempt = 0; attempt < maxAttempts; attempt++) {
try {
return await client.messages.create({
model: "claude-sonnet-4-5-20250514",
max_tokens: 1024,
messages,
});
} catch (error) {
if (error instanceof Anthropic.RateLimitError) {
if (attempt === maxAttempts - 1) throw error;
const wait = Math.min(2 ** attempt * 1000, 60000);
await new Promise((r) => setTimeout(r, wait));
} else {
throw error;
}
}
}
return null;
}
Step 3 — Batch Requests Efficiently
If you are processing many items, use a rate-limited queue:
import asyncio
import anthropic
MAX_CONCURRENT = 5 # Stay well below RPM limit
semaphore = asyncio.Semaphore(MAX_CONCURRENT)
client = anthropic.AsyncAnthropic(max_retries=3)
async def process_item(item):
async with semaphore:
return await client.messages.create(
model="claude-haiku-3-5-20241022",
max_tokens=512,
messages=[{"role": "user", "content": item}]
)
async def main():
items = ["task1", "task2", "task3"] # Your work items
results = await asyncio.gather(
*[process_item(item) for item in items],
return_exceptions=True
)
for r in results:
assert not isinstance(r, Exception), f"Failed: {r}"
return results
Step 4 — Verify Rate Limits Are Not Exhausted
# Check remaining quota from response headers
curl -s -D - https://api.anthropic.com/v1/messages \
-H "x-api-key: $ANTHROPIC_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "content-type: application/json" \
-d '{"model":"claude-sonnet-4-5-20250514","max_tokens":16,"messages":[{"role":"user","content":"1"}]}' \
2>&1 | grep "ratelimit-requests-remaining"
Expected output:
anthropic-ratelimit-requests-remaining: 48
Common Variations
| Scenario | Cause | Quick Fix |
|---|---|---|
| 429 only with Opus | Lower per-model RPM/ITPM limits | Switch to Sonnet for high-throughput tasks |
| 429 in parallel workers | Too many simultaneous requests hitting RPM | Add a semaphore / rate limiter |
| Consistent 429 at start of each minute | ITPM or OTPM exhausted before RPM | Reduce prompt size or output max_tokens |
| Need higher limits | Current tier caps too low | Upgrade usage tier in the Anthropic Console |
Prevention
- Use Haiku for high-volume tasks: Haiku has much higher rate limits than Opus or Sonnet.
- Cache responses: Avoid redundant API calls by caching deterministic responses locally.
- Monitor usage: Track your
ratelimit-*-remainingheaders in dashboards to catch limits before they hit. - Read the
retry-afterheader: On a 429 response, wait exactly the number of seconds specified rather than guessing.
Related Issues
- Fix Claude API Error 500 — Internal Server Error — Server-side errors
- Fix Claude API Error 401 — Authentication Failed — API key issues
- Claude API Error 400 Invalid Request Fix — Browse all API error guides
Last verified: 2026-04-15. Found an issue? Open a GitHub issue.