Why AI prefers to guess instead of admitting it does not know

Cát Tiên (THEO techcrunch) | 08/09/2025 17:06

AI signaling is still a big challenge. OpenAI proposes changing the evaluation mechanism, severely punishing mis statements and encouraging uncertain admission models.

A new study from OpenAI has just raised a thorny question: "Why do large language models (LLM) such as GPT-5 or Chatbot ChatGPT still create hallucinations because the information sounds reasonable but is distorted? And more importantly, what can be done to reduce this phenomenon?"

In a shortcoming blog post, OpenAI admitted that hallucinations are a fundamental challenge of all language models, which is unlikely to be completely eliminated.

To illustrate this thesis, the researchers examined a popular chatbot about Adam Tauman Kalai's doctoral thesis (co-author of the study).

The results showed that the system gave three different and incorrect answers. Even when asked his date of birth, the answer was still wrong.

According to researchers, this phenomenon originates from the initial training method. The language model mainly involves predicting the next word in the text series, not labeling each data with the right or wrong thing.

Popular details such as spelling and sentence marking can be easily learned accurately thanks to consistent rules. Conversely, rare, low- frequency information such as an individual's date of birth is difficult to predict accurately, leading to hallucinations.

Notably, the study did not show that training is the main cause, but emphasized more on the way models are evaluated.

Currently, most systems are scored based on the absolute correct answer number, accidentally creating a model that encourages guessing instead of admitting not knowing.

The authors compared this mechanism to a multiple-choice test such as the candidate guessing that they still have a chance to get a point, but if left empty, they will definitely lose points. Similarly, when AI is only evaluated based on absolute accuracy, the system will tend to "mental to make sense" instead of uncertain admits.

The proposed solution is to change the scoring method, just like the SAT exam has a negative score for the wrong answer.

For AI, this means heavily punishing confident but incorrect answers, while only lightly punishing or even partially rewarding answers that show uncertainty.

As long as the scoreboard continues to reward lucky predictions, the model will continue to learn how to predict, the research team concluded.

This study not only sheds light on the underlying cause of AI algeism, but also opens up a direction to change the evaluation mechanism to encourage models that know how to acknowledge their limits, instead of being confident with incorrect answers.

Cát Tiên (THEO techcrunch)