Most wasted tokens in a chat aren't in the prompt. They're in the retry. You ask for a SQL query, the model returns three paragraphs explaining the query plus the query, and now you re-prompt: "just the SQL, no explanation." You paid for the prose, then paid again for the correction.
Negative and constraint prompting front-loads the rules the model tends to violate, so the first output is usable.
Before
Write a SQL query to get the top 10 customers by revenue.
You get a friendly intro, the query in a fenced block, then a "Note: you may want to add an index..." trailer. You re-prompt to strip it.
After
Write a SQL query for the top 10 customers by revenue. Output only the query in a single code block. No explanation, no commentary, no notes before or after.
One clean generation, no second round trip.
Why it works
The correction turn isn't free: it re-sends the full prior context (your prompt plus the model's verbose reply) as input tokens, then bills you for a fresh completion. On a 600-token chatty answer, the redo can cost more than the original because the bloated response is now part of the context you're paying to reprocess.
Constraints are cheap to state and expensive to omit. "No preamble, no markdown, max 5 bullets" is maybe 12 tokens that pre-empt a 400-token rewrite. The trick is specificity about the failure mode you've seen before, not generic politeness.
A few constraints that reliably earn their keep:
- "Output only the code / JSON / list — nothing else."
- "Do not restate the question or summarize your answer."
- "If a field is unknown, omit it; do not write 'N/A' or guess."
Keep a personal list of the three corrections you type most often and bake them into your default prompt. You're not adding instructions; you're deleting future retries.