60 ways to spend fewer tokens
The 22 Beginner tips are free to read. The 38 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.
Return IDs and Enums, Not Sentences
For classification, routing, and selection tasks, have the model emit a short code, ID, or enum value instead of a polite sentence. The downstream code only needs the token, not the prose around it.
Strip the Preamble: Ask for the Answer Only
Chat models love to restate your question, add caveats, and offer follow-ups. On high-volume tasks those wrapper tokens dominate the bill. Tell the model to return only the payload.
Set max_tokens as a Hard Cost Ceiling, Not an Afterthought
Output tokens are the expensive half of most API bills. Setting an explicit max_tokens on every API call turns an open-ended cost into a known maximum.
Emit CSV, Not Markdown Tables
When the model returns rows of data your code will parse, ask for CSV instead of a Markdown table. The pipes, padding spaces, and separator row in Markdown are tokens that carry no data.
Use Stop Sequences to Cut Generation the Instant You Have Enough
A stop sequence halts generation the moment a chosen string appears — you stop paying for output the instant your data is complete, no truncation guesswork required.
Abort the Stream the Moment You Have Enough
When streaming, close the connection as soon as the part you care about arrives instead of letting the model run to its natural stop. You only pay for tokens actually generated before the abort.
Design a Compact Output Schema (and Skip the Pretty-Printing)
When you need structured data, the shape you ask for directly determines token count. Short keys, no markdown scaffolding, and minified output cut tokens on every response — and the input echo if you loop.
Like what you see?
Get a fresh one in your inbox — weekly free, daily on Pro.