When you tag, score, or route many short items, the expensive part is usually the instructions, not the data. If you call the model once per row, you re-send that instruction block every time. Fold the items into a single list call and the instructions are paid for once.
Before
A loop that classifies 50 support tickets by sentiment makes 50 calls. Each call carries the full system prompt ("You are a sentiment classifier. Reply with positive, negative, or neutral. Consider sarcasm..." ~120 tokens) plus one ticket (~30 tokens). Input cost is roughly 50 x 150 = 7,500 tokens, and the 120-token instruction is repeated 50 times.
After
One call: the same 120-token instruction, then a numbered list of all 50 tickets, asking for a JSON array like [{"id":1,"label":"negative"}, ...]. Input is roughly 120 + (50 x 30) = 1,620 tokens. The instruction is paid once instead of 50 times.
Classify each ticket's sentiment. Return JSON: [{"id":N,"label":...}].
1. "Took three days to hear back, not great."
2. "Works perfectly, thanks!"
...
50. "..."
Why it works
Input tokens are billed per call, so a repeated instruction block is pure waste across a loop. Batching amortizes it over the whole list. You also cut per-request overhead and round trips.
Guardrails: keep batches modest (try 20-50 items) so a single malformed item doesn't corrupt the whole response, and always key outputs by id so you can detect dropped or merged rows. For very long items, smaller batches keep accuracy steadier. Validate that the returned array length matches your input count; if it doesn't, re-run just the missing ids rather than the whole batch. Savings depend heavily on your instruction-to-data ratio: the longer your shared prompt, the bigger the win.