AI models fall short in clinical conversations: Harvard study

Becker's Healthcare January 2, 2025
Erica Carbajal

Large language models like ChatGPT have performed well on medical exams, but they struggle with diagnostic accuracy in real-world clinical interactions.

This is according to a new study led by researchers at Boston-based Harvard Medical School and Stanford (Calif.) University. To conduct the study, the team designed a testing framework, CRAFT-MD, to assess four AI models’ conversation skills and diagnostic accuracy based on scenarios mimicking real-world clinician-patient interactions.

While all four models fared well on medical exam-style questions, they struggled with basic conversations that mimic real-world encounters. Specifically, they showed limitations in asking questions to gather relevant medical history and synthesizing scattered information to make accurate diagnoses.

“The dynamic nature of medical conversations — the need to...

Today's Sponsors

Today's Sponsor

Topics: AI (Artificial Intelligence), Provider, Survey / Study, Technology, Trends

2025-01-02T21:19:14-05:00

Share This Article

AI models fall short in clinical conversations: Harvard study

Today's Sponsors

Today's Sponsor

Share This Article