Count Tokens Before You Hit Send, Not After the Bill Arrives

Most people discover a prompt was huge only when the usage dashboard updates hours later. Flip that: measure first, then decide.

Before (blind sending):

prompt = open("transcript.txt").read()  # who knows how big?
response = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=1024,
    messages=[{"role": "user", "content": prompt}],
)

After (count, then decide):

count = client.messages.count_tokens(
    model="claude-opus-4-8",  # count for the SAME model you'll run inference with
    messages=[{"role": "user", "content": prompt}],
)
print(count.input_tokens)  # e.g. 48210
if count.input_tokens > 8000:
    prompt = trim_or_summarize(prompt)

Token counts are model-specific, so always pass the same model ID you'll use for the real call. For OpenAI models, tiktoken counts offline (enc = tiktoken.encoding_for_model("gpt-4o"); len(enc.encode(text))). Anthropic exposes a hosted count_tokens endpoint; Google's SDK has model.count_tokens().

Why it saves tokens: the API is stateless — your full input is re-sent and re-billed on every call. A retrieval step that silently grows from 2K to 50K tokens (someone pasted a whole log file) costs ~25x more on input, on every request, indefinitely. A pre-flight count lets you enforce a per-feature ceiling before the spend happens, not after.

Two honest caveats:

A rough heuristic of "~4 characters per token" or "~0.75 words per token" is fine for English back-of-envelope math, but it drifts badly for code, JSON, non-Latin scripts, and emoji — use the real tokenizer when the number actually matters.
count_tokens on a hosted API is a network round-trip (tens to a few hundred ms), not free in latency. tiktoken is local and near-instant. Either way, count once per candidate prompt — don't call the hosted endpoint inside a tight loop; cache or batch it.

The payoff is a habit: when a single call would blow past your ceiling, you trim, summarize, or chunk before paying — instead of finding out next month.

Count Tokens Before You Hit Send, Not After the Bill Arrives

Get a fresh tip every morning

More in Measurement & Budgeting

Set Hard Spend Caps in the Provider Console

Log input_tokens and output_tokens on Every Call to Find Your Real Waste

Give Each Feature a Token Budget and Enforce It with max_tokens