Set max_tokens as a Hard Cost Ceiling, Not an Afterthought

On almost every major API, output tokens cost several times more than input tokens. As of early 2026, Claude Sonnet bills output at roughly 5x its input rate, and GPT-class models commonly sit in the 3-4x range. That means a model that rambles is burning your most expensive token type.

max_tokens is your hard ceiling: the model physically cannot emit more than that many output tokens in one response. Many people leave it at the SDK default (often the model's full window), so a single confused call can generate thousands of tokens you never wanted.

Before (no cap, open-ended cost):

resp = client.messages.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Classify this ticket as bug/feature/question."}]
)
# Model may return a paragraph explaining its reasoning -> 150+ output tokens

After (capped to the job):

resp = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=5,  # one word fits easily
    messages=[{"role": "user", "content": "Reply with exactly one word: bug, feature, or question."}]
)

Why it saves tokens: You only pay for output tokens actually generated, so max_tokens is a ceiling rather than a fixed charge. Its real value is protecting against the worst case: a malformed prompt, an injection, or a model that decides to "explain" can otherwise run until it hits the context limit. Sizing the cap to the task (a few tokens for a label, a few hundred for a summary) bounds the bill on every single call.

Size it to the expected output plus a small margin, not the model maximum.
Watch for truncated responses (stop_reason: max_tokens) — that's the signal your cap is too tight, so tune rather than guess.
Pair a tight cap with an instruction telling the model to be brief, so it doesn't get cut off mid-sentence.

Set max_tokens as a Hard Cost Ceiling, Not an Afterthought

Get a fresh tip every morning

More in Output Control

Return IDs and Enums, Not Sentences

Strip the Preamble: Ask for the Answer Only

Emit CSV, Not Markdown Tables