Most teams reach for the synchronous /v1/messages endpoint by reflex, even for work nobody is waiting on: nightly summarization, bulk classification, backfilling embeddings, generating product descriptions. Anthropic's Message Batches API and OpenAI's Batch API both run the identical request at 50% of standard token prices (input and output) in exchange for asynchronous delivery. Most batches finish within an hour; the ceiling is 24 hours.
Before (wasteful) — 5,000 reviews classified live:
for review in reviews: # 5,000 synchronous calls, full price
client.messages.create(
model="claude-haiku-4-5", max_tokens=20,
messages=[{"role": "user", "content": f"Sentiment: {review}"}])
After (lean) — one batch at half price:
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
from anthropic.types.messages.batch_create_params import Request
batch = client.messages.batches.create(requests=[
Request(custom_id=f"rev-{i}", params=MessageCreateParamsNonStreaming(
model="claude-haiku-4-5", max_tokens=20,
messages=[{"role": "user", "content": f"Sentiment: {r}"}]))
for i, r in enumerate(reviews)])
Why it saves money: the discount is pure pricing policy, not a quality or model trade-off — you get the same model and the same tokens billed at 0.5x. There is no per-token overhead and no minimum batch size, so even a 50-request batch is half-off. One batch holds up to 100,000 requests (256 MB), so you also collapse thousands of HTTP round-trips into a single call, which removes connection overhead and most rate-limit friction.
The only real cost is latency. Reserve the live endpoint for anything user-facing; route everything else to batch. A simple rule: if no human is blocked on the response, it belongs in a batch.