Junk data makes AI unreliable, easily creating dangerous false feedback

Cát Tiên (THEO INDIANEXPRESS) | 23/10/2025 17:42

Cornell University research warns that AI models like ChatGPT can suffer from human-like brain degeneration due to long-term exposure to online waste.

A new study from Cornell University (USA) shows that artificial intelligence (AI) models such as ChatGPT can experience "brain degeneration", impaired reasoning, thinking and understanding if they are regularly exposed to garbage content on the Internet.

In a scientific paper titled LLM can cause brain disorders, the research team warned: On-line junk text causes long-term cognitive decline for large language models (LLM).

This is an worrying discovery, as LLM is the foundation of famous chatbots such as ChatGPT, Gemini, Claude or Copilot.

Scientists said they have trained models such as Llama 3 and Qwen 2.5 using data collected from social network X, including short posts, views, strong spread and many false statements.

The results showed that models nourished with waste content had a significant decrease in accuracy from 74.9% to 57.2% in thinking tests.

In particular, the ability to understand context and connect information also decreased sharply, from 84.4% to 52.3%. In other words, these AIs are starting to misunderstand the world by having to process too much confusing, repetitive or distorted data.

Not only that, models also appear a phenomenon that the research team calls ignoring thinking, that is, ignoring some steps in the series of arguments, leading to superficial or incorrect responses.

More seriously, these AIs also develop characteristics such as the subjective tendencies (self-exaltation) or the trend of social oppression, similar to manifestations of personality disorders in humans.

When adjusted by retraining with higher quality data, the reasoning ability of models improves, but cannot return to the original level.

Researchers propose three control steps to prevent brain degeneration in AI:

1. Periodically assess the cognitive capacity of LLM.

2. Strictly control the quality of data during the pre-training process.

3. In-depth research on how misplaced and rapidly spreading content can reshape the behavior of machine learning models.

In the context of the world increasingly relying on AI to search for information, create content and make decisions, this discovery is a warning that even artificial intelligence cannot be immune to cybercrime, if humans do not clean up the data environment they create.

Cát Tiên (THEO INDIANEXPRESS)