The library

60 ways to spend fewer tokens

The 22 Beginner tips are free to read. The 38 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.

All ⚙️ Batching & Automation (7) 💻 Coding Assistants (7) 🧠 Context Management (7) 📊 Measurement & Budgeting (7) 🎚️ Model Selection (4) 📐 Output Control (7) ♻️ Prompt Caching & Reuse (7) ✍️ Prompt Engineering (7) 🔎 Retrieval & RAG (7)

📐Output Control Shrinks classification and routing outputs substantially, frequently 5-15x fewer output tokens per call

Return IDs and Enums, Not Sentences

For classification, routing, and selection tasks, have the model emit a short code, ID, or enum value instead of a polite sentence. The downstream code only needs the token, not the prose around it.

Beginner 2 min Read →

📐Output Control Often 30-60% fewer output tokens on short tasks

Strip the Preamble: Ask for the Answer Only

Chat models love to restate your question, add caveats, and offer follow-ups. On high-volume tasks those wrapper tokens dominate the bill. Tell the model to return only the payload.

Beginner 1 min Read →

📐Output Control Caps runaway costs; output tokens are typically 3-5x the input price

Set max_tokens as a Hard Cost Ceiling, Not an Afterthought

Output tokens are the expensive half of most API bills. Setting an explicit max_tokens on every API call turns an open-ended cost into a known maximum.

Beginner 1 min Read →

📐Output Control Trims tabular output noticeably, commonly 15-40% fewer tokens versus a Markdown table

Emit CSV, Not Markdown Tables

When the model returns rows of data your code will parse, ask for CSV instead of a Markdown table. The pipes, padding spaces, and separator row in Markdown are tokens that carry no data.

Beginner 2 min Read →

📐Output Control 🔒 Pro

Use Stop Sequences to Cut Generation the Instant You Have Enough

A stop sequence halts generation the moment a chosen string appears — you stop paying for output the instant your data is complete, no truncation guesswork required.

Intermediate 1 min Unlock →

📐Output Control 🔒 Pro

Abort the Stream the Moment You Have Enough

When streaming, close the connection as soon as the part you care about arrives instead of letting the model run to its natural stop. You only pay for tokens actually generated before the abort.

Intermediate 1 min Unlock →

📐Output Control 🔒 Pro

Design a Compact Output Schema (and Skip the Pretty-Printing)

When you need structured data, the shape you ask for directly determines token count. Short keys, no markdown scaffolding, and minified output cut tokens on every response — and the input echo if you loop.

Advanced 1 min Unlock →

Like what you see?

Get a fresh one in your inbox — weekly free, daily on Pro.

Subscribe free Go Pro — $9/mo