Move Every Non-Urgent Job to the Batch API and Pay Half Price

Most teams reach for the synchronous /v1/messages endpoint by reflex, even for work nobody is waiting on: nightly summarization, bulk classification, backfilling embeddings, generating product descriptions. Anthropic's Message Batches API and OpenAI's Batch API both run the identical request at 50% of standard token prices (input and output) in exchange for asynchronous delivery. Most batches finish within an hour; the ceiling is 24 hours.

Before (wasteful) — 5,000 reviews classified live:

for review in reviews:  # 5,000 synchronous calls, full price
    client.messages.create(
        model="claude-haiku-4-5", max_tokens=20,
        messages=[{"role": "user", "content": f"Sentiment: {review}"}])

After (lean) — one batch at half price:

from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request

batch = client.messages.batches.create(requests=[
    Request(custom_id=f"rev-{i}", params=MessageCreateParamsNonStreaming(
        model="claude-haiku-4-5", max_tokens=20,
        messages=[{"role": "user", "content": f"Sentiment: {r}"}]))
    for i, r in enumerate(reviews)])

Why it saves money: the discount is pure pricing policy, not a quality or model trade-off — you get the same model and the same tokens billed at 0.5x. There is no per-token overhead and no minimum batch size, so even a 50-request batch is half-off. One batch holds up to 100,000 requests (256 MB), so you also collapse thousands of HTTP round-trips into a single call, which removes connection overhead and most rate-limit friction.

The only real cost is latency. Reserve the live endpoint for anything user-facing; route everything else to batch. A simple rule: if no human is blocked on the response, it belongs in a batch.

Move Every Non-Urgent Job to the Batch API and Pay Half Price

Get a fresh tip every morning

More in Batching & Automation

Classify a Whole List in One Call, Not One Row at a Time

Deduplicate and Cache Identical Requests Before They Ever Hit the API

Use Targeted Retries with Backoff Instead of Blindly Re-Sending