A new study from Harvard Medical School and Beth Israel Deaconess Medical Center is attracting attention as it shows that artificial intelligence (AI) can make diagnoses in the emergency room with higher accuracy than doctors in some cases.
The work was published in the journal Science, by a team of doctors and computer scientists, to evaluate the effectiveness of large language models in many medical contexts, including real emergency situations.
The researchers compared the diagnostic capabilities of the two internists with OpenAI's AI models, including "o1 and 4o".
In a notable experiment, the research team analyzed data from 76 patients who came to the emergency room at Beth Israel (a hospital system in the US).
Doctors' and AI diagnoses are independently evaluated by two other doctors, provided they do not know whether the results are given by AI or humans.
The results showed that the o1 model achieved accuracy or approximation of 67% in the initial classification, higher than the 55% and 50% of the two participating doctors.
Notably, AI shows a clear advantage in the initial classification stage, which is when doctors have the least information but must make quick decisions.
The research team said that in each diagnostic step, the o1 model always gives results equal to or exceeding doctors and previous generation AI models.
At the same time, the research group also emphasized that AI is not "pre-processed data" but only uses information available in electronic medical records at the time of diagnosis, similar to doctors.
This shows the potential of AI in supporting decision-making in high-pressure environments such as emergency rooms.
However, the research team also affirmed that this result does not mean that AI is ready to replace doctors in vital decisions. The research team also called for more pre-rescue tests in the actual environment to fully assess the effectiveness and safety of this technology.
Another limitation pointed out is that research only evaluates AI based on text data. Meanwhile, clinical practice also includes many other factors such as medical imaging, vital signs and direct observation, which are areas where AI is still limited.
Experts also warned about overshadowing the results. Dr. Adam Rodman (a participant in the study and a internist at Beth Israel Deaconess Medical Center) said that there is currently no clear legal framework to determine responsibility when AI makes a wrong diagnosis.
Meanwhile, emergency doctor Kristen Panthagani said that comparing AI to a internist instead of an emergency doctor may not accurately reflect the reality.
Because, in the emergency environment, the main goal of doctors is not to make an immediate final diagnosis, but to quickly identify life-threatening conditions for timely treatment.
In general, the research opens up great prospects for the application of AI in medicine, especially in supporting doctors to make quick decisions. However, to become a reliable tool in practice, AI still needs to overcome many challenges in terms of technology, legality and ethics.