60 ways to spend fewer tokens
The 22 Beginner tips are free to read. The 38 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.
Move Every Non-Urgent Job to the Batch API and Pay Half Price
If a job doesn't need an answer in the next few seconds, send it through the Batch API instead of the live endpoint. The exact same request costs half as much.
Classify a Whole List in One Call, Not One Row at a Time
Send 20-50 items as a numbered list and get back a JSON array of labels, instead of paying for the same instruction prompt on every single row.
Deduplicate and Cache Identical Requests Before They Ever Hit the API
Real-world batches are full of repeats. Hash each request, send each unique prompt once, and fan the answer back out to every duplicate.
Use Targeted Retries with Backoff Instead of Blindly Re-Sending
Distinguish retryable errors from real failures, back off on rate limits, and resend only the failed items so you stop paying for accidental duplicate generations.
Stack Prompt Caching on Top of Your Batch Jobs for Compounding Savings
Batch requests support prompt caching. When every request shares a big instruction block or document, cache it once and the per-request cost collapses.
Run an Async Queue with a Concurrency Cap Instead of Firing All at Once
Push jobs through a bounded worker pool so you saturate your rate limit without tripping it, eliminating the retry storms and tier upgrades that quietly inflate cost.
Collapse Many Tiny Calls into One Structured Request
Ten one-item calls re-send your instructions ten times. Batch the items into a single request with a structured-output schema and pay the overhead once.
Like what you see?
Get a fresh one in your inbox — weekly free, daily on Pro.