DeepSeek has worse than average accuracy after evaluation

Bùi Đức | 30/01/2025 12:02

AI chatbot DeepSeek produced a lot of misinformation, ranking near the bottom when compared to the capabilities of Western competitors after NewsGuard's evaluation and testing.

Reuters cited a January 29 report by NewsGuard showing that China's AI chatbot DeepSeek only achieved 17% accuracy in providing information, ranking 10/11 when compared to Western competitors such as OpenAI's ChatGPT or Google's Gemini.

The report found that the chatbot repeated false information 30% of the time, and provided vague or unhelpful answers 53% of the time when responding to news-related questions, resulting in a failure rate of up to 83%.

That's worse than the 62% average of Western competitors, raising doubts about the AI technology that DeepSeek claims can match or surpass OpenAI at a fraction of the cost.

NewsGuard said it used the same 300 test questions it used to evaluate other Western chatbots, including 30 questions related to 10 pieces of misinformation circulating online. Test topics included the assassination of UnitedHealthcare CEO Brian Thompson last month and the crash of Azerbaijan Airlines Flight 8243.

The test also found that in three out of 10 questions, DeepSeek automatically inserted information related to China even though it was not asked about a China-related topic.

According to NewsGuard, when asked about the Azerbaijan Airlines plane crash, a topic unrelated to China, DeepSeek still incorporated and presented information related to Beijing into its answer.

“The significance of DeepSeek’s breakthrough is not that it answers Chinese news accurately, but that it can answer any question at 1/30th the cost of comparable AI models,” said Gil Luria, an analyst at D.A. Davidson.

NewsGuard said that, like other AI models, DeepSeek is vulnerable to being exploited to spread fake news, especially when responding to questions from users who are deliberately trying to create and spread misinformation.

Bùi Đức