Stop Paying Frontier Prices for Boilerplate Work

The single biggest model-selection waste is using a flagship model as your default for everything. Classification, formatting, short rewrites, and tag extraction do not need a frontier model, and the price gap is large.

Consider Anthropic's lineup as a concrete example. Claude Opus is roughly $5 input / $25 output per million tokens, Sonnet is around $3 / $15, and Haiku is about $1 / $5. So a tiny task on Opus output costs ~5x what the same task costs on Haiku. The same tiering exists across providers (OpenAI's gpt-5 mini/nano tiers, Gemini Flash vs Pro).

Before (wasteful):

# Every call hits the flagship, including a yes/no check
resp = client.messages.create(
    model="claude-opus-4-8",          # $5 / $25 per MTok
    max_tokens=5,
    messages=[{"role": "user",
               "content": f"Is this email spam? Reply yes or no.\n{email}"}],
)

After (lean):

resp = client.messages.create(
    model="claude-haiku-4-5",         # ~$1 / $5 per MTok
    max_tokens=5,
    messages=[{"role": "user",
               "content": f"Is this email spam? Reply yes or no.\n{email}"}],
)

Why it saves: a binary classifier produces a handful of output tokens and needs little reasoning. The smaller model returns the same answer here, and you pay roughly one-fifth the per-token rate. Latency usually drops too, since smaller models respond faster.

The practical move: audit your call sites and bucket them as trivial (classify, extract, format), moderate (summarize, draft), and hard (multi-step reasoning, tricky code, ambiguous judgment). Route the first two buckets down a tier and reserve the flagship for the last. Validate quality on a sample before rolling out — if a downgraded route regresses, bump it back up. You are not chasing a universal cheap model; you are right-sizing per task.

Stop Paying Frontier Prices for Boilerplate Work

Get a fresh tip every morning

More in Model Selection

Cascade: Try the Cheap Model First, Escalate Only When It Fails

Don't Burn Reasoning Tokens on Tasks That Don't Reason

Use a Big Model as the Planner, Small Models as the Workers