Claude Batch API 50% Discount Complete Guide

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

The Claude Batch API charges exactly half the standard rate on every model, for both input and output tokens. Opus 4.7 drops from $5.00/$25.00 to $2.50/$12.50 per million tokens. Sonnet 4.6 drops from $3.00/$15.00 to $1.50/$7.50. The only requirement: your workload can tolerate up to one hour of processing delay.

The Setup

You run a code review pipeline that analyzes 5,000 pull requests per week using Opus 4.7. Each review takes roughly 20,000 input tokens (the code diff plus review instructions) and produces 5,000 output tokens (the review comments).

At standard pricing, your weekly bill is $1,125. Switching the same workload to the Batch API – with identical prompts, models, and output quality – cuts it to $562.50. You save $562.50 per week ($2,437/month) by changing how you submit requests.

The Math

5,000 code reviews per week, Opus 4.7:

Standard API:

Batch API:

Savings: $562.50/week ($29,250/year)

Full batch pricing table:

Model Standard In/Out Batch In/Out Weekly Savings (5K reviews)
Opus 4.7 $5.00/$25.00 $2.50/$12.50 $562.50
Sonnet 4.6 $3.00/$15.00 $1.50/$7.50 $337.50
Haiku 4.5 $1.00/$5.00 $0.50/$2.50 $112.50

The Technique

The Batch API accepts up to 100,000 requests per batch, with a 256 MB size limit. You submit a batch of requests, receive a batch ID, and poll for results.

import anthropic
import time

client = anthropic.Anthropic()

def submit_review_batch(reviews: list[dict]) -> str:
    """Submit code reviews as a batch for 50% savings.

    Each review dict has: id, diff_text, instructions
    """
    requests = []
    for review in reviews:
        requests.append({
            "custom_id": review["id"],
            "params": {
                "model": "claude-opus-4-7-20250415",
                "max_tokens": 8192,
                "messages": [
                    {
                        "role": "user",
                        "content": (
                            f"{review['instructions']}\n\n"
                            f"Code diff:\n```\n{review['diff_text']}\n```"
                        )
                    }
                ]
            }
        })

    batch = client.batches.create(requests=requests)
    print(f"Batch {batch.id}: {len(requests)} requests submitted")
    print(f"Estimated cost: ${len(requests) * 0.1125:.2f} "
          f"(vs ${len(requests) * 0.225:.2f} standard)")
    return batch.id


def poll_batch(batch_id: str, interval: int = 30) -> list:
    """Wait for batch completion and return results."""
    while True:
        batch = client.batches.retrieve(batch_id)
        counts = batch.request_counts

        if batch.processing_status == "ended":
            print(f"Done: {counts.succeeded} succeeded, "
                  f"{counts.errored} errored")
            break

        print(f"Processing: {counts.processing}/{counts.succeeded + counts.processing + counts.errored}")
        time.sleep(interval)

    return list(client.batches.results(batch_id))


# Submit 5,000 code reviews
reviews = load_pending_reviews()  # Your data source
batch_id = submit_review_batch(reviews)
results = poll_batch(batch_id)

# Process results
for result in results:
    review_id = result.custom_id
    if result.result.type == "succeeded":
        review_text = result.result.message.content[0].text
        save_review(review_id, review_text)
    else:
        log_error(review_id, result.result.error)

Key implementation details:

For quick batch submission via the command line:

# Create a JSONL file with batch requests
cat > batch.jsonl << 'JSONL_END'
{"custom_id":"r1","params":{"model":"claude-sonnet-4-6-20250929","max_tokens":2048,"messages":[{"role":"user","content":"Summarize this code: def add(a,b): return a+b"}]}}
{"custom_id":"r2","params":{"model":"claude-sonnet-4-6-20250929","max_tokens":2048,"messages":[{"role":"user","content":"Summarize this code: def mul(a,b): return a*b"}]}}
JSONL_END

# Submit via API
curl -X POST https://api.anthropic.com/v1/messages/batches \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "content-type: application/json" \
  -d '{"requests": '"$(cat batch.jsonl | jq -s .)"'}'

The Tradeoffs

The 50% discount comes with real constraints:

Implementation Checklist

  1. Identify API workloads that do not require real-time responses
  2. Estimate batch size: count pending requests and total token volume
  3. Verify total batch size stays under 256 MB
  4. Convert real-time API calls to batch request format (add custom_id per request)
  5. Submit batch and implement polling logic for completion
  6. Build result processing pipeline that handles both successes and errors
  7. Compare first batch invoice against standard API cost for the same volume

Measuring Impact

Verify the 50% discount is applying correctly: