60 ways to spend fewer tokens
The 22 Beginner tips are free to read. The 38 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.
Drop the Screenshot Once the Model Has Read It
Images, PDFs, and attachments are charged as tokens and re-sent every turn in a multimodal thread. After the model has described or transcribed one, you usually don't need to keep sending the pixels.
Paste the Function, Not the Whole File
Most coding questions need 20-40 lines, not your 800-line file. Send the relevant slice plus a one-line note about the rest, and your input shrinks dramatically without hurting the answer.
Start a New Chat When the Topic Changes
Chat apps re-send your whole conversation with every message. When you switch tasks, the old turns become dead weight you keep paying to re-transmit — even with caching discounts.
Prune Bulky Tool Results Once You've Used Them
Tool and function-call outputs are the heaviest, most disposable thing in an agent transcript. Once the model has extracted what it needs, replace the raw result with a one-line stub before the next turn.
Compress Long Threads with a Rolling Summary
Instead of dragging a 40-turn thread forward, periodically have the model write a compact state summary, then continue from that.
Keep Working State in a File, Not in the Conversation
On long tasks, let the model write its plan, findings, and decisions to an external scratchpad and re-read only the slice it needs, instead of accumulating all of it as ever-growing conversation history.
Build a Sliding Window with a Cached, Stable Prefix
For API apps, cap history at the last N turns and put unchanging instructions first so prompt caching can discount the prefix.
Like what you see?
Get a fresh one in your inbox — weekly free, daily on Pro.