Claude API Error 429 rate_limit_error Fix
When you hit the Claude API rate limit, the API returns a 429 status code with a rate_limit_error type. This guide explains exactly why it happens and how to handle it in your code.
The Error
{
"type": "error",
"error": {
"type": "rate_limit_error",
"message": "Your account has hit a rate limit."
},
"request_id": "req_018EeWyXxfu5pfWkrYcMdjWG"
}
Quick Fix
- Check the
retry-afterresponse header for how long to wait. - Enable the SDK’s built-in retry mechanism (on by default with 2 retries).
- Monitor rate limit headers to throttle before hitting the limit.
What Causes This
The Claude API enforces three types of rate limits per tier:
- RPM (Requests Per Minute): Maximum number of API calls per minute.
- ITPM (Input Tokens Per Minute): Maximum input tokens processed per minute.
- OTPM (Output Tokens Per Minute): Maximum output tokens generated per minute.
Limits vary by spend tier. At Tier 4, Claude Opus 4.x allows 4,000 RPM, 2,000,000 ITPM, and 400,000 OTPM. The Opus 4.x rate limit is shared across Opus 4.6, 4.5, 4.1, and 4.
The API uses a token bucket algorithm where capacity is continuously replenished rather than reset at fixed intervals. You can also see 429 errors due to acceleration limits when your organization has a sharp increase in usage.
Full Solution
Python SDK with Built-in Retries
The Python SDK retries 429 errors automatically with exponential backoff (2 retries by default):
import anthropic
# Default: 2 retries with exponential backoff
client = anthropic.Anthropic()
# Increase retries for high-throughput workloads
client = anthropic.Anthropic(max_retries=5)
# Override retries per request
message = client.with_options(max_retries=5).messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
TypeScript SDK with Built-in Retries
import Anthropic from "@anthropic-ai/sdk";
// Default: 2 retries with exponential backoff
const client = new Anthropic();
// Increase retries
const client2 = new Anthropic({ maxRetries: 5 });
Monitor Rate Limit Headers
The API returns rate limit status in response headers. Check these before you hit the limit:
import anthropic
client = anthropic.Anthropic()
response = client.messages.with_raw_response.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{"role": "user", "content": "Hello"}]
)
# Check remaining capacity
remaining_requests = response.headers.get("anthropic-ratelimit-requests-remaining")
remaining_tokens = response.headers.get("anthropic-ratelimit-tokens-remaining")
reset_time = response.headers.get("anthropic-ratelimit-requests-reset")
retry_after = response.headers.get("retry-after")
print(f"Requests remaining: {remaining_requests}")
print(f"Tokens remaining: {remaining_tokens}")
print(f"Reset at: {reset_time}")
Rate Limit Headers Reference
| Header | Description |
|---|---|
anthropic-ratelimit-requests-limit |
Maximum requests per period |
anthropic-ratelimit-requests-remaining |
Requests left in current window |
anthropic-ratelimit-requests-reset |
When the request limit resets |
anthropic-ratelimit-tokens-limit |
Maximum tokens per period |
anthropic-ratelimit-tokens-remaining |
Tokens left in current window |
anthropic-ratelimit-tokens-reset |
When the token limit resets |
retry-after |
Seconds to wait before retrying |
Prevention
- Use prompt caching: Cache-read input tokens do NOT count towards ITPM limits. With an 80% cache hit rate and a 2M ITPM limit, your effective throughput reaches 10M tokens per minute.
- Use the Batch API: For non-time-sensitive workloads, the Message Batches API has separate, higher rate limits (500,000 batch requests in queue at Tier 4) and costs 50% less.
- Upgrade your spend tier: Higher tiers get higher rate limits. Tier 1 starts at $5 credit purchase; Tier 4 requires $400.
- Set workspace limits: Configure per-workspace limits to prevent one service from consuming your entire organization’s quota.
Related Guides
- Claude API Error 529 overloaded_error Fix – handle API overload errors that look similar to rate limits.
- Claude Prompt Caching API Guide – reduce ITPM usage by caching repeated prompt content.
- Claude API Error 400 invalid_request_error Fix – debug malformed requests that waste your rate limit budget.
- Claude Streaming API Guide – streaming responses help avoid timeout issues on long requests.
- Claude SDK Timeout Configuration – configure timeouts alongside retry logic.