Strip the Preamble: Ask for the Answer Only

Instruction-tuned chat models are trained to be conversational, so by default they wrap a one-line answer in restatement, hedging, and a closing offer to help. On a short extraction or classification task, that wrapper can be larger than the actual answer — and you pay output rates for every word of it.

Before (wasteful):

Prompt: What's the capital of Australia?

Output: Great question! The capital of Australia is Canberra, not Sydney as many people assume. Canberra was purpose-built as the capital and is located in the Australian Capital Territory. Let me know if you'd like more details!

That's ~45 output tokens to deliver one word.

After (lean):

Prompt: What's the capital of Australia? Answer with the city name only, no other text.

Output: Canberra

That's ~1-2 output tokens.

Why it saves tokens: Billing counts every generated token, and the preamble/postamble is pure output you're charged the (higher) output rate for. Removing it doesn't change the information you receive. At one call this is trivial; across 100k classification calls it's the difference between paying for one useful token and paying for forty.

Reliable phrasings that suppress the wrapper:

"Answer with X only. No explanation, no preamble."
"Respond with a single word/number/line."
"Do not restate the question or add closing remarks."

A caveat worth knowing: for tasks that genuinely benefit from reasoning (math, multi-step logic), forcing a bare answer can hurt accuracy, because you're suppressing the model's working-out. The lean-answer rule is for retrieval, extraction, classification, and formatting — not for problems where the chain of thought is doing real work. Use a system prompt to apply the rule globally so you don't repeat it on every request.

Strip the Preamble: Ask for the Answer Only

Get a fresh tip every morning

More in Output Control

Return IDs and Enums, Not Sentences

Set max_tokens as a Hard Cost Ceiling, Not an Afterthought

Emit CSV, Not Markdown Tables