The library

60 ways to spend fewer tokens

The 22 Beginner tips are free to read. The 38 advanced tactics unlock with Pro — plus a fresh tip in your inbox every morning.

All ⚙️ Batching & Automation (7) 💻 Coding Assistants (7) 🧠 Context Management (7) 📊 Measurement & Budgeting (7) 🎚️ Model Selection (4) 📐 Output Control (7) ♻️ Prompt Caching & Reuse (7) ✍️ Prompt Engineering (7) 🔎 Retrieval & RAG (7)

♻️Prompt Caching & Reuse up to 90% on the cached portion

Freeze the Prefix: One Stray Timestamp Kills Your Whole Cache

Prompt caching is a prefix match. A single dynamic byte near the top of your prompt silently invalidates everything after it, so you pay full price every call without realizing it.

Beginner 1 min Read →

♻️Prompt Caching & Reuse Cache reads run roughly 0.1x of base input price; the more users hit the same prefix, the closer your shared instructions get to free

Share One Cached System Prompt Across All Your Users

A single per-user byte (name, ID, locale) in the system prompt forks the cache into one entry per user. Strip personalization out of the prefix so every user reads the same cached block.

Beginner 1 min Read →

♻️Prompt Caching & Reuse 🔒 Pro

Cache Your Tool Definitions, Not Just the System Prompt

Tool schemas render before the system prompt, so a non-deterministic tool list silently blocks the cache for everything after it. Sort and freeze the tool array to make tools cacheable.

Intermediate 1 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Know Your Minimum: Short Prompts Silently Refuse to Cache

Below a model-specific token floor, a cache_control marker does nothing — no error, just a full-price bill. Know the floor before you rely on caching.

Intermediate 1 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Order Your Prompt by Volatility: Tools, then System, then the Question

The model renders tools, then system, then messages. Put your most stable content first and your most volatile content last, or your breakpoints cache nothing reusable.

Intermediate 1 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Match TTL to Traffic, and Pre-Warm Before the First User Hits

The default 5-minute cache evaporates between bursts. Choose 5-minute vs 1-hour TTL by your traffic gaps, and pre-warm at startup to kill first-request latency.

Advanced 1 min Unlock →

♻️Prompt Caching & Reuse 🔒 Pro

Read the Usage Block to Prove Your Cache Actually Hits

A cache_control marker that silently never hits looks identical to one that works — until you read the three usage token fields and compute your real hit rate.

Advanced 1 min Unlock →

Like what you see?

Get a fresh one in your inbox — weekly free, daily on Pro.

Subscribe free Go Pro — $9/mo