Claude Batch Processing Limits and Best Practices

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Claude’s Batch API has two hard limits: 100,000 requests per batch and 256 MB total payload size. Exceeding either causes the batch to be rejected. Within those limits, you get a 50% discount on every token – Opus 4.7 drops from $5.00/$25.00 to $2.50/$12.50 per million tokens. Here is how to work within these constraints for maximum savings.

The Setup

You are scaling a data processing pipeline from 10,000 to 500,000 requests. At 10K requests, a single batch handles everything. At 500K, you need to split across 5+ batches, handle partial failures across batches, and manage concurrent batch processing.

The 50% savings at this volume is substantial. On Sonnet 4.6 with 5K input and 3K output tokens per request, 500K requests at standard pricing costs $30,000. At batch pricing: $15,000. You save $15,000 per run – but only if you handle the limits correctly.

The Math

Payload size estimation:

Average request JSON size: ~2-3 KB (5K input tokens + params + headers)

500K requests, Sonnet 4.6, 5K input + 3K output:

Standard pricing:

Batch pricing:

Savings: $15,000 (50%)

Splitting 500K into batches of 80K each (safe margin under limits):

The Technique

Here is a production-grade batch manager that handles chunking, concurrent submission, and error recovery:

import anthropic
import json
import time
import threading
from dataclasses import dataclass, field

client = anthropic.Anthropic()

MAX_REQUESTS = 100_000
MAX_BYTES = 256 * 1024 * 1024  # 256 MB
SAFE_MARGIN = 0.9  # Use 90% of limits for safety


@dataclass
class BatchResult:
    batch_id: str
    succeeded: int = 0
    failed: int = 0
    failed_ids: list = field(default_factory=list)
    results: list = field(default_factory=list)


def chunk_by_limits(
    requests: list[dict],
    max_count: int = int(MAX_REQUESTS * SAFE_MARGIN),
    max_bytes: int = int(MAX_BYTES * SAFE_MARGIN)
) -> list[list[dict]]:
    """Split requests into chunks respecting both hard limits."""
    chunks = []
    current = []
    current_bytes = 0

    for req in requests:
        req_bytes = len(json.dumps(req).encode())

        if len(current) >= max_count or current_bytes + req_bytes > max_bytes:
            if current:
                chunks.append(current)
            current = []
            current_bytes = 0

        current.append(req)
        current_bytes += req_bytes

    if current:
        chunks.append(current)

    return chunks


def submit_batch(requests: list[dict]) -> str:
    """Submit a single batch and return its ID."""
    batch = client.batches.create(requests=requests)
    return batch.id


def wait_for_batch(batch_id: str) -> BatchResult:
    """Poll batch until complete, return structured result."""
    result = BatchResult(batch_id=batch_id)

    while True:
        status = client.batches.retrieve(batch_id)

        if status.processing_status == "ended":
            counts = status.request_counts
            print(f"  Batch {batch_id[:12]}: "
                  f"{counts.succeeded} ok, {counts.errored} errors")
            break

        time.sleep(30)

    for item in client.batches.results(batch_id):
        if item.result.type == "succeeded":
            result.succeeded += 1
            result.results.append({
                "id": item.custom_id,
                "content": item.result.message.content[0].text
            })
        else:
            result.failed += 1
            result.failed_ids.append(item.custom_id)

    return result


def process_at_scale(requests: list[dict]) -> dict:
    """Process any number of requests with automatic chunking."""
    chunks = chunk_by_limits(requests)

    print(f"Total: {len(requests)} requests in {len(chunks)} batches")
    for i, chunk in enumerate(chunks):
        size_mb = sum(len(json.dumps(r).encode()) for r in chunk) / 1e6
        print(f"  Batch {i+1}: {len(chunk)} requests, {size_mb:.1f} MB")

    # Submit all batches
    batch_ids = []
    for i, chunk in enumerate(chunks):
        bid = submit_batch(chunk)
        batch_ids.append(bid)
        print(f"  Submitted batch {i+1}: {bid[:12]}")

    # Wait for all batches (could parallelize with threads)
    all_results = []
    all_failed = []

    for bid in batch_ids:
        br = wait_for_batch(bid)
        all_results.extend(br.results)
        all_failed.extend(br.failed_ids)

    return {
        "total_succeeded": len(all_results),
        "total_failed": len(all_failed),
        "failed_ids": all_failed,
        "results": all_results
    }

Best practices for production batch processing:

# Pre-submission validation
python3 -c "
import json

# Validate batch before submission
requests = [json.loads(l) for l in open('batch_requests.jsonl')]

total_bytes = sum(len(json.dumps(r).encode()) for r in requests)
total_count = len(requests)

print(f'Requests: {total_count:,} (limit: 100,000)')
print(f'Size: {total_bytes/1e6:.1f} MB (limit: 256 MB)')

if total_count > 100000:
    chunks_needed = (total_count // 90000) + 1
    print(f'WARNING: Must split into {chunks_needed} batches')

if total_bytes > 256 * 1024 * 1024:
    chunks_needed = (total_bytes // (230 * 1024 * 1024)) + 1
    print(f'WARNING: Must split into {chunks_needed} batches (size)')

# Validate each request has custom_id
missing_ids = [i for i, r in enumerate(requests) if 'custom_id' not in r]
if missing_ids:
    print(f'ERROR: {len(missing_ids)} requests missing custom_id')
"

Key best practices:

  1. Use 90% of limits to account for JSON serialization overhead differences between your estimate and the API’s parser.
  2. Retry failed requests in separate batches. Collect failed_ids, rebuild those requests, submit as a new batch.
  3. Use deterministic custom_ids. If you need to retry a batch, duplicate IDs let you deduplicate results.
  4. Process results promptly. The 29-day retention window is generous but not permanent.
  5. Monitor batch duration. If batches consistently approach the 24-hour timeout, reduce batch size.

The Tradeoffs

Operating at batch limits introduces specific risks:

Implementation Checklist

  1. Estimate per-request payload size (JSON serialized bytes)
  2. Calculate total requests x size to determine chunking needs
  3. Implement the chunking function with 90% safety margin
  4. Add pre-submission validation (count, size, custom_id presence)
  5. Submit chunks and track batch IDs for result retrieval
  6. Implement retry pipeline for failed requests
  7. Set up alerting for batches approaching the 24-hour timeout

Measuring Impact

Monitor batch operations at scale: