Fix Claude API Rate Limit Errors (HTTP 429)

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

TL;DR: HTTP 429 means you have exceeded your request or token rate limit. Implement exponential backoff, check your usage tier limits, and batch requests to stay within bounds.

The Problem

Your Claude API calls start failing with HTTP 429:

{
  "type": "error",
  "error": {
    "type": "rate_limit_error",
    "message": "Rate limit exceeded. Please retry after X seconds."
  }
}

Why This Happens

Anthropic enforces rate limits across three dimensions:

Limits vary by usage tier (Tier 1, Tier 2, Tier 3, Tier 4, and Monthly Invoicing) and by model. Opus models typically have lower limits than Sonnet or Haiku. You can view your current tier and limits in the Anthropic Console.

The Fix

Step 1 — Check Your Current Limits and Usage

# Make a request and inspect rate limit headers
curl -s -D - https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-5-20250514","max_tokens":32,"messages":[{"role":"user","content":"hi"}]}' \
  -o /dev/null 2>&1 | grep -i "rate\|limit\|retry"

Key response headers returned on every request:

anthropic-ratelimit-requests-limit        — Your RPM cap
anthropic-ratelimit-requests-remaining    — Remaining requests this window
anthropic-ratelimit-requests-reset        — When the RPM window resets (ISO 8601)
anthropic-ratelimit-input-tokens-limit    — Your ITPM cap
anthropic-ratelimit-input-tokens-remaining
anthropic-ratelimit-input-tokens-reset
anthropic-ratelimit-output-tokens-limit   — Your OTPM cap
anthropic-ratelimit-output-tokens-remaining
anthropic-ratelimit-output-tokens-reset
retry-after                               — Seconds to wait before retrying (on 429 responses)

Step 2 — Implement Proper Retry Logic

Python SDK (automatic retries built in):

import anthropic
import time

client = anthropic.Anthropic(max_retries=5)

def call_with_backoff(messages, max_attempts=5):
    """Call API with manual backoff for rate limits."""
    for attempt in range(max_attempts):
        try:
            return client.messages.create(
                model="claude-sonnet-4-5-20250514",
                max_tokens=1024,
                messages=messages
            )
        except anthropic.RateLimitError as e:
            if attempt == max_attempts - 1:
                raise
            wait = min(2 ** attempt, 60)  # Cap at 60s
            print(f"Rate limited. Waiting {wait}s...")
            time.sleep(wait)
    return None  # Unreachable but satisfies bounded return

TypeScript SDK:

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic({ maxRetries: 5 });

async function callWithBackoff(
  messages: Anthropic.MessageParam[],
  maxAttempts = 5
): Promise<Anthropic.Message | null> {
  for (let attempt = 0; attempt < maxAttempts; attempt++) {
    try {
      return await client.messages.create({
        model: "claude-sonnet-4-5-20250514",
        max_tokens: 1024,
        messages,
      });
    } catch (error) {
      if (error instanceof Anthropic.RateLimitError) {
        if (attempt === maxAttempts - 1) throw error;
        const wait = Math.min(2 ** attempt * 1000, 60000);
        await new Promise((r) => setTimeout(r, wait));
      } else {
        throw error;
      }
    }
  }
  return null;
}

Step 3 — Batch Requests Efficiently

If you are processing many items, use a rate-limited queue:

import asyncio
import anthropic

MAX_CONCURRENT = 5  # Stay well below RPM limit
semaphore = asyncio.Semaphore(MAX_CONCURRENT)
client = anthropic.AsyncAnthropic(max_retries=3)

async def process_item(item):
    async with semaphore:
        return await client.messages.create(
            model="claude-haiku-3-5-20241022",
            max_tokens=512,
            messages=[{"role": "user", "content": item}]
        )

async def main():
    items = ["task1", "task2", "task3"]  # Your work items
    results = await asyncio.gather(
        *[process_item(item) for item in items],
        return_exceptions=True
    )
    for r in results:
        assert not isinstance(r, Exception), f"Failed: {r}"
    return results

Step 4 — Verify Rate Limits Are Not Exhausted

# Check remaining quota from response headers
curl -s -D - https://api.anthropic.com/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"model":"claude-sonnet-4-5-20250514","max_tokens":16,"messages":[{"role":"user","content":"1"}]}' \
  2>&1 | grep "ratelimit-requests-remaining"

Expected output:

anthropic-ratelimit-requests-remaining: 48

Common Variations

Scenario Cause Quick Fix
429 only with Opus Lower per-model RPM/ITPM limits Switch to Sonnet for high-throughput tasks
429 in parallel workers Too many simultaneous requests hitting RPM Add a semaphore / rate limiter
Consistent 429 at start of each minute ITPM or OTPM exhausted before RPM Reduce prompt size or output max_tokens
Need higher limits Current tier caps too low Upgrade usage tier in the Anthropic Console

Prevention


Last verified: 2026-04-15. Found an issue? Open a GitHub issue.