AI chatbot such as ChatGPT, Gemini and Claude are becoming a popular tool in digital life. However, users often find a strange phenomenon that after a period of chatting, these models seem to los memory and forget what you have just said, or start repeating and answering incorrectly.
Regarding this issue, according to technology experts, it originates from a key technical concept called context window.
YouTuber and AI researcher Matt Pocock recently shared in his video that context windows are an important limitation, but also the most misunderstood in the way large language models (LLM) work. Simply put, it is the short-term memory of artificial intelligence.
What is a context window?
Every time a user submits a question and an answer model, the entire text will be divided into small units called token. Each token can represent a few characters or parts of a word. All the widgets in the conversation will create a context that the model can see at one point.
If the context window of a model is 200,000 units, it can only remember that much information. When it crosses this limit, older data will be deleted, causing AI to forget the beginning of the conversation.
For example, Claude 4.5 can remember up to 200,000 token, while the Gemini 2.5 Pro can process up to 2 million. In contrast, small models such as LLaMA or Mistral are limited to a few thousand.
Why can't AI have unlimited memory?
Increasing the capacity of context windows is not always feasible. Each additional account consumes computing resources and memory, causing operating costs to increase.
In addition, when the context is too big, the model finds it difficult to find the right details, like digging the needle at the bottom of the tank.
More importantly, each model is designed with fixed architectural limits, so it is impossible to limit the expansion of memory without exchanging performance.
Lost in the middle: When AI Forgets the middle of the Story
Pocock calls a typical phenomenon of context limitation midpoint forgetting, which is sometimes translated as missing in the middle.
AI often focuses on the beginning (instruction) and end ( Latest message) of a conversation, while the middle part receives less attention.
This mechanism comes from the way LLM models allocate attention to the token. Like humans, they prioritize what is happening more recently. As a result, the information in the middle, although important, is easily overlooked.
This is especially difficult for programmers. If a developer asked AI to fix the error in the code a few hundred lines ago, the model may not remember the exact part because it has slipped out of the picture.
Impact of programming AI tools
Tools like Claude Code or GitHub Copilot also work within the context window limitation. When the project or session is too long, they can easily forget orders, give incorrect feedback or stop responding.
Therefore, professional users often have to divide them into smaller, concise or reschedule the session to help AI stay focused.
The fact that chatbots like ChatGPT or Gemini are "forgotten" is not a system failure, but a natural limitation of current technology.
As technology companies continue to expand the cloud cloud and optimize storage, the future could see AI models that remember longer, understand more deeply, and get closer to real AI.