Payload size estimation: Average request JSON size: ~2-3 KB (5K input tokens + params + headers) 100K requests x 2.5 KB = 250 MB (near the 256 MB limit) 100K requests x 3 KB = 300 MB (exceeds limit, must split) 500K requests, Sonnet 4.6, 5K input + 3K output: Standard pricing: Input: ...

Operating at batch limits introduces specific risks: Timeout risk increases with size. A 100K batch during high API load may take longer than a 10K batch. Consider whether 10 x 10K batches (with faster individual completion) is better than 1 x 100K. Debugging is harder at scale. A single ma...

Claude Batch Processing Limits (2026)

Last updated: April 19, 2026

Claude’s Batch API has two hard limits: 100,000 requests per batch and 256 MB total payload size. Exceeding either causes the batch to be rejected. Within those limits, you get a 50% discount on every token – Opus 4.7 drops from $5.00/$25.00 to $2.50/$12.50 per million tokens. Here is how to work within these constraints for maximum savings.

The Setup

You are scaling a data processing pipeline from 10,000 to 500,000 requests. At 10K requests, a single batch handles everything. At 500K, you need to split across 5+ batches, handle partial failures across batches, and manage concurrent batch processing.

The 50% savings at this volume is substantial. On Sonnet 4.6 with 5K input and 3K output tokens per request, 500K requests at standard pricing costs $30,000. At batch pricing: $15,000. You save $15,000 per run – but only if you handle the limits correctly.

The Math

Payload size estimation:

Average request JSON size: ~2-3 KB (5K input tokens + params + headers)

100K requests x 2.5 KB = 250 MB (near the 256 MB limit)
100K requests x 3 KB = 300 MB (exceeds limit, must split)

500K requests, Sonnet 4.6, 5K input + 3K output:

Standard pricing:

Input: 2.5B tokens x $3.00/MTok = $7,500
Output: 1.5B tokens x $15.00/MTok = $22,500
Total: $30,000

Batch pricing:

Input: 2.5B x $1.50/MTok = $3,750
Output: 1.5B x $7.50/MTok = $11,250
Total: $15,000

Savings: $15,000 (50%)

Splitting 500K into batches of 80K each (safe margin under limits):

7 batches x ~80K requests each
Total processing time: 1-3 hours (batches can run concurrently)

The Technique

Here is a production-grade batch manager that handles chunking, concurrent submission, and error recovery:

import anthropic
import json
import time
import threading
from dataclasses import dataclass, field
client = anthropic.Anthropic()
MAX_REQUESTS = 100_000
MAX_BYTES = 256 * 1024 * 1024  # 256 MB
SAFE_MARGIN = 0.9  # Use 90% of limits for safety

@dataclass
class BatchResult:
    batch_id: str
    succeeded: int = 0
    failed: int = 0
    failed_ids: list = field(default_factory=list)
    results: list = field(default_factory=list)
def chunk_by_limits(
    requests: list[dict],
    max_count: int = int(MAX_REQUESTS * SAFE_MARGIN),
    max_bytes: int = int(MAX_BYTES * SAFE_MARGIN)
) -> list[list[dict]]:
    """Split requests into chunks respecting both hard limits."""
    chunks = []
    current = []
    current_bytes = 0
    for req in requests:
        req_bytes = len(json.dumps(req).encode())
        if len(current) >= max_count or current_bytes + req_bytes > max_bytes:
            if current:
                chunks.append(current)
            current = []
            current_bytes = 0
        current.append(req)
        current_bytes += req_bytes
    if current:
        chunks.append(current)
    return chunks
def submit_batch(requests: list[dict]) -> str:
    """Submit a single batch and return its ID."""
    batch = client.batches.create(requests=requests)
    return batch.id
def wait_for_batch(batch_id: str) -> BatchResult:
    """Poll batch until complete, return structured result."""
    result = BatchResult(batch_id=batch_id)
    while True:
        status = client.batches.retrieve(batch_id)
        if status.processing_status == "ended":
            counts = status.request_counts
            print(f"  Batch {batch_id[:12]}: "
                  f"{counts.succeeded} ok, {counts.errored} errors")
            break
        time.sleep(30)
    for item in client.batches.results(batch_id):
        if item.result.type == "succeeded":
            result.succeeded += 1
            result.results.append({
                "id": item.custom_id,
                "content": item.result.message.content[0].text
            })
        else:
            result.failed += 1
            result.failed_ids.append(item.custom_id)
    return result
def process_at_scale(requests: list[dict]) -> dict:
    """Process any number of requests with automatic chunking."""
    chunks = chunk_by_limits(requests)
    print(f"Total: {len(requests)} requests in {len(chunks)} batches")
    for i, chunk in enumerate(chunks):
        size_mb = sum(len(json.dumps(r).encode()) for r in chunk) / 1e6
        print(f"  Batch {i+1}: {len(chunk)} requests, {size_mb:.1f} MB")
    # Submit all batches
    batch_ids = []
    for i, chunk in enumerate(chunks):
        bid = submit_batch(chunk)
        batch_ids.append(bid)
        print(f"  Submitted batch {i+1}: {bid[:12]}")
    # Wait for all batches (could parallelize with threads)
    all_results = []
    all_failed = []
    for bid in batch_ids:
        br = wait_for_batch(bid)
        all_results.extend(br.results)
        all_failed.extend(br.failed_ids)
    return {
        "total_succeeded": len(all_results),
        "total_failed": len(all_failed),
        "failed_ids": all_failed,
        "results": all_results
    }

Best practices for production batch processing:

# Pre-submission validation
python3 -c "
import json
# Validate batch before submission
requests = [json.loads(l) for l in open('batch_requests.jsonl')]
total_bytes = sum(len(json.dumps(r).encode()) for r in requests)
total_count = len(requests)
print(f'Requests: {total_count:,} (limit: 100,000)')
print(f'Size: {total_bytes/1e6:.1f} MB (limit: 256 MB)')
if total_count > 100000:
    chunks_needed = (total_count // 90000) + 1
    print(f'WARNING: Must split into {chunks_needed} batches')
if total_bytes > 256 * 1024 * 1024:
    chunks_needed = (total_bytes // (230 * 1024 * 1024)) + 1
    print(f'WARNING: Must split into {chunks_needed} batches (size)')
# Validate each request has custom_id
missing_ids = [i for i, r in enumerate(requests) if 'custom_id' not in r]
if missing_ids:
    print(f'ERROR: {len(missing_ids)} requests missing custom_id')
"

Key best practices:

Use 90% of limits to account for JSON serialization overhead differences between your estimate and the API’s parser.
Retry failed requests in separate batches. Collect failed_ids, rebuild those requests, submit as a new batch.
Use deterministic custom_ids. If you need to retry a batch, duplicate IDs let you deduplicate results.
Process results promptly. The 29-day retention window is generous but not permanent.
Monitor batch duration. If batches consistently approach the 24-hour timeout, reduce batch size.

The Tradeoffs

Operating at batch limits introduces specific risks:

Timeout risk increases with size. A 100K batch during high API load may take longer than a 10K batch. Consider whether 10 x 10K batches (with faster individual completion) is better than 1 x 100K.
Debugging is harder at scale. A single malformed request in 100K is difficult to identify without good custom_id conventions.
Memory pressure. Downloading 100K results creates a large in-memory object. Use streaming result retrieval or process results in pages.
Concurrency limits. Anthropic may restrict the number of concurrent active batches per API key. Check current limits in your account settings.

Implementation Checklist

Estimate per-request payload size (JSON serialized bytes)
Calculate total requests x size to determine chunking needs
Implement the chunking function with 90% safety margin
Add pre-submission validation (count, size, custom_id presence)
Submit chunks and track batch IDs for result retrieval
Implement retry pipeline for failed requests
Set up alerting for batches approaching the 24-hour timeout

Measuring Impact

Monitor batch operations at scale:

Chunk efficiency: Actual requests per chunk vs the 100K limit. Closer to 100K means fewer batches and less overhead.
Processing speed: Requests completed per minute. Track across batches to identify slowdowns.
Retry volume: Failed requests requiring resubmission. Should be under 0.1% of total.
Compare total cost for the full batch run against standard API pricing to verify the 50% discount

Which model? → Take the 5-question quiz in our Model Selector.

Try it: Paste your error into our Error Diagnostic for an instant fix.