Multicorn
ai-101tokenscontext-windowcost

What Are Tokens and Why Do They Matter?

Tokens are how AI models measure text. This article explains what they are, why they affect your experience, and what context windows and token limits mean in practice.

Multicorn Team

The short version

Every time you use an AI tool, your text is broken into small pieces called tokens. Tokens determine how much text the model can process at once, how much your usage costs, and why long conversations sometimes lose track of earlier details. This article explains what tokens are and why they matter, in plain English.

What is a token?

A token is a small chunk of text that an AI model processes as a single unit. It is not exactly a word, and it is not exactly a character. It is somewhere in between.

As a rough guide:

  • The word "hello" is one token.
  • The word "unbelievable" might be split into three tokens: "un", "believ", "able".
  • A short sentence like "The cat sat on the mat." is about seven tokens.
  • A typical page of English text is roughly 300-400 tokens.

The model does not read words the way you do. Before it processes anything, your text is run through a tokeniser, a tool that breaks the text into tokens based on patterns the model learned during training. Common words stay whole. Longer or less common words get split into pieces.

You do not need to memorise the exact rules. The key takeaway is: tokens are the basic unit of measurement for everything an AI model does.

What is a context window?

The context window is the total number of tokens a model can handle in a single conversation. Think of it as the model's working memory: everything that fits inside the window is what the model can "see" at once.

The context window includes everything: your messages, the AI's responses, any instructions or system prompts running in the background, and any documents or text you have pasted in.

Here are approximate context window sizes for popular models as of early 2026:

  • GPT-4o (ChatGPT): 128,000 tokens (roughly 200 pages of text)
  • Claude 3.5 Sonnet: 200,000 tokens (roughly 320 pages of text)
  • Gemini 1.5 Pro: up to 2,000,000 tokens (roughly 3,200 pages of text)

These numbers are large, but they are not unlimited. And in practice, the usable window is often smaller than the maximum because of the way models allocate attention across the text.

Why do long conversations lose track?

If you have ever had a long conversation with an AI tool and noticed that it "forgot" something you said earlier, the context window is the reason.

As a conversation grows, it eventually approaches the edge of the context window. When that happens, the oldest parts of the conversation may be dropped or given less attention by the model. The result: the AI loses track of details you mentioned early on.

This is not a bug. It is a fundamental constraint of how these models work. The model can only process a fixed amount of text at once. When the conversation exceeds that limit, something has to give.

Practical tip: If you are having a long conversation and the AI seems to be forgetting earlier context, restate the important details in your latest message. Do not assume the model remembers everything from the beginning of the conversation.

Why do tokens affect cost?

If you use AI tools through a paid plan or an API, you are typically charged based on the number of tokens processed. Both your input (what you send) and the output (what the AI generates) count toward the total.

Here is why this matters:

Longer prompts cost more. If you paste a 50-page document and ask the AI to summarise it, you are paying for all 50 pages of input tokens plus the summary output tokens.

Longer responses cost more. If you ask for a detailed, thorough answer, the AI generates more output tokens, which increases the cost.

Repeated context adds up. Every time you send a new message in a conversation, the AI typically reprocesses the entire conversation history. Message 20 in a conversation costs more than message 1 because the model is processing all 19 previous messages again.

For casual use on a fixed-price plan (like ChatGPT Plus at $20/month), token costs are invisible: you pay a flat fee regardless of how many tokens you use, up to a usage cap. But for businesses building applications with the API, or for teams with heavy usage, token costs become a real line item in the budget.

Tokens and AI agents

For AI agents (covered in What Is an AI Agent? and What Can AI Agents Actually Do Today?), tokens matter even more.

An agent that manages your email, calendar, and Slack might process thousands of tokens per action: reading messages, analysing context, generating responses, and logging results. Over a workday, that adds up to tens of thousands of tokens. Over a month, it can be millions.

This has two practical consequences:

Cost control matters. An agent running without token awareness can generate large bills quickly. Understanding tokens helps you estimate costs and set appropriate limits.

Context window limits affect agent quality. An agent working on a complex, multi-step task needs to keep track of what it has done so far, what it still needs to do, and the details of each step. If the task exceeds the context window, the agent may lose track of earlier steps and make mistakes, just like a chatbot forgetting the beginning of a long conversation.

Practical tips for managing tokens

Keep prompts focused. Include the information the model needs and nothing more. Long preambles, repeated instructions, and unnecessary background text all consume tokens without improving results.

Start new conversations for new topics. Instead of continuing a 50-message thread, start fresh when you move to a new subject. This keeps the context clean and avoids hitting the window limit.

Summarise before continuing. If you have a long conversation and need to keep going, ask the AI to summarise the key points so far. Then start a new conversation with that summary as context.

Choose the right model for the task. If you are working with very long documents, choose a model with a larger context window. For short, focused tasks, a smaller (and often cheaper) model may work just as well.

Key takeaways

  • Tokens are the small text chunks that AI models process, roughly three-quarters of a word on average.
  • The context window is the maximum number of tokens a model can handle at once. When conversations exceed it, the model loses track of earlier content.
  • Token usage directly affects cost, especially for API users and businesses running AI agents at scale.
  • Agents consume tokens continuously as they work, making cost control and context management important considerations.
  • Keep prompts focused, start new conversations for new topics, and summarise long threads to manage tokens effectively.

Next up: Is My Data Safe with AI Tools?

Stay up to date with Multicorn

Get the latest articles and product updates delivered to your inbox.

We'll send you updates about Multicorn. No spam, ever. Unsubscribe any time. Privacy policy