AI Language Model Tokens Explained: Words, Memory, and Limits

In the fascinating realm of AI language models, the concept of 'tokens' is integral to understanding how these digital marvels dissect and comprehend text. Tokens are essentially the building blocks of text processing, and they come in various forms: word tokens, subword tokens, special tokens, and context tokens. Word tokens are whole words, subword tokens capture fragments of words—a common occurrence in complex languages—while special tokens serve unique purposes, such as signifying the start or end of a sentence. Context tokens, however, are all about grasping the bigger picture, encoding the broader setting in which words are used.

For this discussion, we'll zoom in on the most prevalent types: word and context tokens.

What Is a Token?

Imagine a token as a snippet of text, a piece of the puzzle in language models like ChatGPT. This snippet might be a single character or an entire word in English. Tokenization, the process of breaking down text into tokens, is sensitive to language nuances, meaning token sizes can vary across languages. Here's an example to illustrate:

Take the sentence: “I don’t like pizza.”

  • “I” is a token
  • " don’t" includes a space and is considered a token
  • "’t" is treated as a separate token
  • " like" also begins with a space, making it a token
  • " pizza" with its preceding space is a token

Notice how spaces and punctuation are part of the token count, influencing the total in a given passage.

Here you can have an example of how tokens are counted:

You can try it yourself at

Token Limits:

When engaging with language models, it's essential to be aware of token limits. These limits dictate how much text the model can generate in response to your input.

Here's a simplified breakdown of token limits for various models:

  • GPT-3.5: Maximum of 4,096 tokens.
  • GPT-3.5-16K: Maximum of 4,096 tokens.
  • GPT-4: Maximum of 8,192 tokens.
  • GPT-4-32K: Maximum of 32,768 tokens.
  • GPT 4 Turbo-128K: Maximum of 4,096 tokens.

These token limits are critical because they influence how lengthy your prompts can be and how detailed the model's responses are. If you exceed these limits, you'll likely encounter errors, as the model can't process or generate texts beyond its capacity.

It's essential to note that both input (your prompt or conversation history) and output (the model’s response) contribute to your token count. It's a common belief that only the input text counts towards tokens, but the model's response length is also a factor. You can direct the language model to generate a specific length of content, such as requesting a 500-word article, to manage the token usage. For instance, if your prompt uses 10 tokens, and the model crafts a response of 15 tokens, the total token count would be 25.

What are context tokens?

Context tokens are essentially the memory span of a language model. Think of it as having a conversation with someone who can only hold onto a certain number of your last sentences before they begin to forget the earlier part of the conversation. Those sentences they can recall at any given moment are like the language model's context tokens.

So, if a language model has a context token limit of 1,000, it's like it can remember roughly the last 1,000 words from what you've told it. Once it goes beyond that, it starts to lose track of the earlier words. The greater the number of context tokens a language model can handle, the more it can remember from your conversation, helping it respond more coherently and contextually.

For instance, some of the well-known models include:

  • GPT-3.5: 4,096 tokens
  • GPT-3.5-16K: 16,385 tokens
  • GPT-4: 8,192 tokens
  • GPT-4-32K: 32,768 tokens
  • GPT-4 Turbo-128K: 128,000 tokens

It's important to clear up a common point of confusion regarding tokens. Many people believe that having more tokens means you can process longer texts. However, it's crucial to distinguish between context tokens and word tokens—they are not the same thing.

The term context tokens refers to the amount of text that a language model can take into account at any given moment to understand and generate a response. This is like the model's working memory.

For instance, let's consider GPT-4 Turbo with a context window of 128K tokens. While it can remember up to 128,000 tokens of conversation, which helps it maintain coherence over long discussions, this doesn't mean you can send or receive 128,000 tokens in one go. The actual token limit for GPT-4 Turbo is 4,096 tokens. This is the maximum number of tokens that the model can generate in response to your input at one time. So, while GPT-4 Turbo can keep track of a vast context, any single interaction is still bound by its output token limit of 4,096 tokens.

To sum up:

  • Context Tokens: The total amount of text (in tokens) the model can remember from previous parts of the conversation to understand the context.
  • Token Limit: The maximum number of tokens you can input and the model can output at one time.

Understanding the difference between context tokens and token limits helps set realistic expectations for the length and complexity of interactions with language models.