Becker's Healthcare January 2, 2025
Large language models like ChatGPT have performed well on medical exams, but they struggle with diagnostic accuracy in real-world clinical interactions.
This is according to a new study led by researchers at Boston-based Harvard Medical School and Stanford (Calif.) University. To conduct the study, the team designed a testing framework, CRAFT-MD, to assess four AI models’ conversation skills and diagnostic accuracy based on scenarios mimicking real-world clinician-patient interactions.
While all four models fared well on medical exam-style questions, they struggled with basic conversations that mimic real-world encounters. Specifically, they showed limitations in asking questions to gather relevant medical history and synthesizing scattered information to make accurate diagnoses.
“The dynamic nature of medical conversations — the need to...