Drop the Screenshot Once the Model Has Read It

Multimodal chats hide a recurring cost: every image and attachment in the thread is converted to tokens and re-sent on every follow-up turn, the same way text is. A single screenshot commonly lands in the high hundreds to a couple thousand input tokens depending on its resolution, and a multi-page PDF can be far more. If you paste a screenshot, get an answer, and keep chatting, you re-pay for those pixels on turn after turn even though you're now only discussing text.

Before (wasteful):

Turn 1: [uploads a 1,600-token dashboard screenshot] "What's the error in this chart?"
Turn 2: "Okay, how do I fix that query?" (screenshot re-sent)
Turn 3: "Write a test for it." (screenshot re-sent again)

Three turns, three full charges for an image you only needed once.

After (lean):

Turn 1: [uploads screenshot] "Transcribe the error text and the failing query verbatim, then we'll continue."
Turn 2+: Continue in a fresh message (or new chat) using the transcribed text. No image re-sent.

You convert the visual into a few dozen tokens of text once, then work from the text.

Why it works: Vision models tokenize images by area or tile count, so a high-res attachment is genuinely expensive, and a stateless API re-bills the whole conversation each turn. Once the salient content is in text form, the pixels add cost without adding information. Removing them shrinks every subsequent request.

Practical moves:

Ask the model to extract what matters (error text, table values, layout notes) on the first turn, then drop the image.
In chat UIs that keep attachments inline, start a new chat once you're past the visual step and paste the extracted text.
Downscale before uploading when fine detail isn't needed; a 4K screenshot of a few lines of red text wastes most of its tokens.
Keep the image only while you're genuinely still asking about visual specifics (exact colors, spatial layout).

The rule of thumb: an image is an expensive input you should consume once and convert to cheap text, not a sticky attachment you drag through the whole conversation.

Drop the Screenshot Once the Model Has Read It

Get a fresh tip every morning

More in Context Management

Paste the Function, Not the Whole File

Start a New Chat When the Topic Changes

Prune Bulky Tool Results Once You've Used Them