The library

60 ways to spend fewer tokens

The 22 Beginner tips are free to read. The 38 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.

All ⚙️ Batching & Automation (7) 💻 Coding Assistants (7) 🧠 Context Management (7) 📊 Measurement & Budgeting (7) 🎚️ Model Selection (4) 📐 Output Control (7) ♻️ Prompt Caching & Reuse (7) ✍️ Prompt Engineering (7) 🔎 Retrieval & RAG (7)

⚙️Batching & Automation ~50% on input + output tokens

Move Every Non-Urgent Job to the Batch API and Pay Half Price

If a job doesn't need an answer in the next few seconds, send it through the Batch API instead of the live endpoint. The exact same request costs half as much.

Beginner 1 min Read →

⚙️Batching & Automation Often 40-70% fewer input tokens on bulk classification, varies with prompt size

Classify a Whole List in One Call, Not One Row at a Time

Send 20-50 items as a numbered list and get back a JSON array of labels, instead of paying for the same instruction prompt on every single row.

Beginner 1 min Read →

⚙️Batching & Automation Varies; commonly 20-60% on duplicate-heavy workloads

Deduplicate and Cache Identical Requests Before They Ever Hit the API

Real-world batches are full of repeats. Hash each request, send each unique prompt once, and fan the answer back out to every duplicate.

Beginner 1 min Read →

⚙️Batching & Automation 🔒 Pro

Use Targeted Retries with Backoff Instead of Blindly Re-Sending

Distinguish retryable errors from real failures, back off on rate limits, and resend only the failed items so you stop paying for accidental duplicate generations.

Intermediate 2 min Unlock →

⚙️Batching & Automation 🔒 Pro

Stack Prompt Caching on Top of Your Batch Jobs for Compounding Savings

Batch requests support prompt caching. When every request shares a big instruction block or document, cache it once and the per-request cost collapses.

Intermediate 1 min Unlock →

⚙️Batching & Automation 🔒 Pro

Run an Async Queue with a Concurrency Cap Instead of Firing All at Once

Push jobs through a bounded worker pool so you saturate your rate limit without tripping it, eliminating the retry storms and tier upgrades that quietly inflate cost.

Advanced 2 min Unlock →

⚙️Batching & Automation 🔒 Pro

Collapse Many Tiny Calls into One Structured Request

Ten one-item calls re-send your instructions ten times. Batch the items into a single request with a structured-output schema and pay the overhead once.

Advanced 2 min Unlock →

Like what you see?

Get a fresh one in your inbox — weekly free, daily on Pro.

Subscribe free Go Pro — $9/mo