An AI chatbot outperformed physicians and physicians plus AI in a trial - What does that mean?

01/08/25 at 03:00 AM

An AI chatbot outperformed physicians and physicians plus AI in a trial - What does that mean?
JAMA; by Yulin Hswen, Rita Rubin; 12/27/24
Jonathan Chen, MD, PhD, an assistant professor of medicine, and Ethan Goh, MBBS, MS, a postdoctoral scholar, both at Stanford University, have collaborated for 2 years on studying the integration of human and artificial intelligence to enhance clinical decision-making. They published a randomized clinical trial in JAMA Network Open on October 28 that found that the use of a large language model (LLM) did not significantly enhance physicians’ diagnostic reasoning beyond that of conventional resources. Surprisingly, though, the LLM alone performed better than the physicians did with either the LLM or the conventional resources... In our pilot study, we thought the doctors who had access to the chatbot were going to do way better than the doctors who only had access to the usual internet - UpToDate, PubMed, Google, whatever. Then when we actually did the randomized study, that didn’t turn out to be the case, which is really weird. The chatbot by itself did surprisingly better than all of the doctors, including the doctors that accessed the chatbot. That flew in the face of the fundamental theorem of informatics: human plus computer will deliver better results than either would alone. That sounds so good, right? I’ve been saying it as the last line in my talks for years. I don’t say it anymore because results like these challenge it.
Publisher's note: An interesting interview with the authors of JAMA's Large Language Model Influence on Diagnostic Reasoning article, which we ran 11/16/24.

Back to Literature Review