The library

60 ways to spend fewer tokens

The 22 Beginner tips are free to read. The 38 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.

All ⚙️ Batching & Automation (7) 💻 Coding Assistants (7) 🧠 Context Management (7) 📊 Measurement & Budgeting (7) 🎚️ Model Selection (4) 📐 Output Control (7) ♻️ Prompt Caching & Reuse (7) ✍️ Prompt Engineering (7) 🔎 Retrieval & RAG (7)

🧠Context Management Images can be 1,000-2,000+ tokens each; removing stale ones cuts that per turn

Drop the Screenshot Once the Model Has Read It

Images, PDFs, and attachments are charged as tokens and re-sent every turn in a multimodal thread. After the model has described or transcribed one, you usually don't need to keep sending the pixels.

Beginner 2 min Read →

🧠Context Management 50-90% on file-heavy prompts

Paste the Function, Not the Whole File

Most coding questions need 20-40 lines, not your 800-line file. Send the relevant slice plus a one-line note about the rest, and your input shrinks dramatically without hurting the answer.

Beginner 2 min Read →

🧠Context Management Trims re-sent history; often 20-60% fewer input tokens per turn after a topic switch

Start a New Chat When the Topic Changes

Chat apps re-send your whole conversation with every message. When you switch tasks, the old turns become dead weight you keep paying to re-transmit — even with caching discounts.

Beginner 2 min Read →

🧠Context Management 🔒 Pro

Prune Bulky Tool Results Once You've Used Them

Tool and function-call outputs are the heaviest, most disposable thing in an agent transcript. Once the model has extracted what it needs, replace the raw result with a one-line stub before the next turn.

Intermediate 2 min Unlock →

🧠Context Management 🔒 Pro

Compress Long Threads with a Rolling Summary

Instead of dragging a 40-turn thread forward, periodically have the model write a compact state summary, then continue from that.

Intermediate 2 min Unlock →

🧠Context Management 🔒 Pro

Keep Working State in a File, Not in the Conversation

On long tasks, let the model write its plan, findings, and decisions to an external scratchpad and re-read only the slice it needs, instead of accumulating all of it as ever-growing conversation history.

Advanced 2 min Unlock →

🧠Context Management 🔒 Pro

Build a Sliding Window with a Cached, Stable Prefix

For API apps, cap history at the last N turns and put unchanging instructions first so prompt caching can discount the prefix.

Advanced 2 min Unlock →

Like what you see?

Get a fresh one in your inbox — weekly free, daily on Pro.

Subscribe free Go Pro — $9/mo