Fix: Claude API Error 429 Rate Limit

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

The Error

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Rate limit reached. Please try again later."
  }
}

HTTP status code: 429 Too Many Requests. The response includes a retry-after header indicating how long to wait.

Quick Fix

  1. Check the retry-after response header and wait the specified duration
  2. Enable SDK automatic retries (both SDKs retry 429s by default with 2 retries)
  3. Reduce request frequency or switch to the Message Batches API for bulk workloads
const client = new Anthropic({
  maxRetries: 5, // Default is 2
});

What Causes This

Anthropic enforces rate limits at multiple levels using a token bucket algorithm where capacity is continuously replenished rather than reset at fixed intervals.

Types of rate limits:

Rate limits at Tier 4 (highest self-serve tier):

Model RPM ITPM OTPM
Opus 4.x 4,000 2,000,000 400,000
Sonnet 4.x 4,000 2,000,000 400,000
Haiku 4.5 4,000 4,000,000 800,000

Opus 4.x rate limits are shared across Opus 4.6, 4.5, 4.1, and 4. Sonnet 4.x rate limits are shared across Sonnet 4.6, 4.5, and 4.

Cache-aware ITPM: Only uncached input tokens count towards ITPM for most models. With an 80% cache hit rate and a 2M ITPM limit, effective throughput reaches 10M tokens per minute.

You may also see 429 errors from acceleration limits – sharp increases in usage can trigger rate limiting even if you are within your tier’s steady-state limits.

Full Solution

Option 1: SDK Automatic Retry

Both official SDKs include built-in retry with exponential backoff. The default is 2 retries for connection errors, 408, 409, 429, and 500+ status codes.

// TypeScript SDK
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({
  maxRetries: 5, // Default is 2
});
# Python SDK
import anthropic

client = anthropic.Anthropic(
    max_retries=5,  # Default is 2
)

# Or per-request override:
client.with_options(max_retries=5).messages.create(...)

Option 2: Manual Retry with Backoff

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function callWithBackoff(
  fn: () => Promise<any>,
  maxRetries = 5,
  baseDelay = 1000
): Promise<any> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await fn();
    } catch (error) {
      if (
        error instanceof Anthropic.RateLimitError &&
        attempt < maxRetries - 1
      ) {
        const delay = baseDelay * 2 ** attempt + Math.random() * 1000;
        await new Promise((r) => setTimeout(r, delay));
        continue;
      }
      throw error;
    }
  }
}

Option 3: Use the Message Batches API

For workloads that can tolerate asynchronous processing, the Message Batches API offers 50% cost savings and separate rate limits. Most batches complete within 1 hour.

import anthropic

client = anthropic.Anthropic()

batch = client.messages.batches.create(
    requests=[
        {
            "custom_id": f"request-{i}",
            "params": {
                "model": "claude-sonnet-4-6",
                "max_tokens": 1024,
                "messages": [{"role": "user", "content": prompt}],
            },
        }
        for i, prompt in enumerate(prompts)
    ]
)

# Check batch status later
result = client.messages.batches.retrieve(batch.id)

A single batch can contain up to 100,000 requests or 256 MB, whichever is reached first. Results are available for 29 days after creation.

Option 4: Monitor Rate Limit Headers

The API returns these headers with every response:

Monitor the remaining headers to throttle your requests before hitting the limit.

Prevention