Instruction-tuned chat models are trained to be conversational, so by default they wrap a one-line answer in restatement, hedging, and a closing offer to help. On a short extraction or classification task, that wrapper can be larger than the actual answer — and you pay output rates for every word of it.
Before (wasteful):
Prompt:
What's the capital of Australia?Output:
Great question! The capital of Australia is Canberra, not Sydney as many people assume. Canberra was purpose-built as the capital and is located in the Australian Capital Territory. Let me know if you'd like more details!
That's ~45 output tokens to deliver one word.
After (lean):
Prompt:
What's the capital of Australia? Answer with the city name only, no other text.Output:
Canberra
That's ~1-2 output tokens.
Why it saves tokens: Billing counts every generated token, and the preamble/postamble is pure output you're charged the (higher) output rate for. Removing it doesn't change the information you receive. At one call this is trivial; across 100k classification calls it's the difference between paying for one useful token and paying for forty.
Reliable phrasings that suppress the wrapper:
- "Answer with X only. No explanation, no preamble."
- "Respond with a single word/number/line."
- "Do not restate the question or add closing remarks."
A caveat worth knowing: for tasks that genuinely benefit from reasoning (math, multi-step logic), forcing a bare answer can hurt accuracy, because you're suppressing the model's working-out. The lean-answer rule is for retrieval, extraction, classification, and formatting — not for problems where the chain of thought is doing real work. Use a system prompt to apply the rule globally so you don't repeat it on every request.