Anthropic Message Batches API Guide

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

The Problem

You need to process hundreds or thousands of Claude API requests but sending them one at a time is slow, expensive, and hits rate limits. Real-time responses are not required for your use case.

Quick Fix

Use the Message Batches API to submit up to 100,000 requests in a single batch at 50% of standard API pricing:

import anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

client = anthropic.Anthropic()

message_batch = client.messages.batches.create(
    requests=[
        Request(
            custom_id="request-1",
            params=MessageCreateParamsNonStreaming(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=[{"role": "user", "content": "Summarize this document..."}],
            ),
        ),
    ]
)
print(message_batch.id)

What’s Happening

The Message Batches API processes requests asynchronously instead of synchronously. When you submit a batch, Anthropic queues all requests and processes them in parallel. Most batches complete within 1 hour. Each request in the batch is handled independently, so one failure does not affect others.

The key advantage is cost: all batch usage is charged at 50% of standard API prices. For Claude Sonnet 4.6, that means $1.50 per million input tokens and $7.50 per million output tokens instead of $3 and $15 respectively.

Step-by-Step Fix

Step 1: Prepare your batch requests

Each request needs a unique custom_id (1-64 alphanumeric characters, hyphens, and underscores) and a params object with standard Messages API parameters:

import anthropic
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

client = anthropic.Anthropic()

requests = []
for i, doc in enumerate(documents):
    requests.append(
        Request(
            custom_id=f"doc-{i}",
            params=MessageCreateParamsNonStreaming(
                model="claude-sonnet-4-6",
                max_tokens=1024,
                messages=[
                    {"role": "user", "content": f"Summarize: {doc}"}
                ],
            ),
        )
    )

message_batch = client.messages.batches.create(requests=requests)
print(f"Batch ID: {message_batch.id}")

Step 2: Poll for completion

Check the batch status until processing finishes:

import time

while True:
    batch = client.messages.batches.retrieve(message_batch.id)
    if batch.processing_status == "ended":
        break
    print(f"Status: {batch.processing_status} - "
          f"{batch.request_counts.succeeded} succeeded, "
          f"{batch.request_counts.processing} processing")
    time.sleep(30)

Step 3: Retrieve results

Stream results for the completed batch:

for result in client.messages.batches.results(message_batch.id):
    if result.result.type == "succeeded":
        print(f"{result.custom_id}: {result.result.message.content[0].text}")
    elif result.result.type == "errored":
        print(f"{result.custom_id}: Error - {result.result.error}")

Step 4: Handle errors and expiration

Individual requests can fail without affecting the batch. Batches expire if processing does not complete within 24 hours. Results are available for 29 days after creation.

batch = client.messages.batches.retrieve(message_batch.id)
counts = batch.request_counts
print(f"Succeeded: {counts.succeeded}")
print(f"Errored: {counts.errored}")
print(f"Expired: {counts.expired}")

Step 5: Use with prompt caching for better performance

Since batches can take time to process, use the 1-hour cache duration for shared context:

requests.append(
    Request(
        custom_id=f"doc-{i}",
        params=MessageCreateParamsNonStreaming(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            system=[{
                "type": "text",
                "text": shared_system_prompt,
                "cache_control": {"type": "ephemeral"}
            }],
            messages=[
                {"role": "user", "content": f"Analyze: {doc}"}
            ],
        ),
    )
)

Batch limits

Pricing reference

Model Batch Input Batch Output
Claude Opus 4.6 $2.50/MTok $12.50/MTok
Claude Sonnet 4.6 $1.50/MTok $7.50/MTok
Claude Haiku 4.5 $0.50/MTok $2.50/MTok

Prevention

Design your batch pipelines to handle partial failures. Always check request_counts after processing ends. Implement retry logic for expired or errored requests by resubmitting them in a new batch.

For large-scale evaluations, split work into multiple batches under the 100K request limit and process them concurrently for maximum throughput.


Level Up Your Claude Code Workflow

The developers who get the most out of Claude Code aren’t just fixing errors — they’re running multi-agent pipelines, using battle-tested CLAUDE.md templates, and shipping with production-grade operating principles.

Get Claude Code Mastery — included in Zovo Lifetime →

16 CLAUDE.md templates · 80+ prompts · orchestration configs · workflow playbooks. $99 once, free forever.