Stop Pasting Whole Documents: Retrieve the 3 Chunks That Actually Answer the Question

70-95% on document-heavy prompts Beginner 2 min read

The single biggest token leak in DIY RAG is shoving an entire document into context "just in case." The model only needs the few passages that answer the question, but you pay for every token you send.

Before (wasteful):

System: Here is our employee handbook.
[42,000 tokens of handbook]
User: How many days of paid parental leave do I get?

You pay for ~42,000 input tokens on every single question, even though the answer lives in one paragraph.

After (lean):

System: Answer using only the context below.
Context:
[chunk 1: "Parental leave: full-time employees receive 16 weeks..."]
[chunk 2: "Eligibility begins after 90 days of employment..."]
User: How many days of paid parental leave do I get?

Now you send ~400 tokens of context instead of 42,000.

Why this saves tokens: LLM APIs bill per input token. A 42,000-token handbook at, say, \$3 per million input tokens costs ~\$0.13 per question; the 400-token version costs under a tenth of a cent. Over thousands of queries that is the difference between a hobby bill and a serious one. It also speeds up responses, since prompt processing time scales with input length.

How to do it today without a vector DB:

  • Split the document into ~300-500 token chunks.
  • Embed each chunk once (e.g., OpenAI text-embedding-3-small, Cohere embed-v3, or Voyage), store the vectors.
  • At query time, embed the question, grab the top 3-5 nearest chunks, and send only those.

A quick gut check: if your average prompt is mostly static reference text and only a sentence or two of actual question, you are almost certainly over-sending. Retrieval flips that ratio. Even a crude keyword search (BM25) beats pasting the whole file, and you can layer embeddings on later. Start by capping injected context at a fixed token budget and measure the drop in your bill.

Applies to: OpenAI APIAnthropic Claude APICohereGemini APIChatGPT
Don't just read it — build the habit

Get a fresh tip every morning

You're reading a free Beginner tip. Pro unlocks all 38 advanced tactics and sends a new one daily — $9/mo, cancel anytime.