Nature August 7, 2024
Robert Kaczmarczyk, Theresa Isabelle Wilhelm, Ron Martin & Jonas Roos

Abstract

This study evaluates multimodal AI models’ accuracy and responsiveness in answering NEJM Image Challenge questions, juxtaposed with human collective intelligence, underscoring AI’s potential and current limitations in clinical diagnostics. Anthropic’s Claude 3 family demonstrated the highest accuracy among the evaluated AI models, surpassing the average human accuracy, while collective human decision-making outperformed all AI models. GPT-4 Vision Preview exhibited selectivity, responding more to easier questions with smaller images and longer questions.

Multimodal AI for medical diagnosis: potential and challenges

The rapid integration of Large Language Models (LLMs) like GPT-4 into various domains necessitates their evaluation in specialized tasks such as medical diagnostics1,2,3.

Recent studies evaluating the viability of GPT-4V...

Today's Sponsors

LEK
ZeOmega

Today's Sponsor

LEK

 
Topics: AI (Artificial Intelligence), Provider, Survey / Study, Technology, Trends
Google digs deeper into healthcare AI: 5 notes
JP Morgan Annual Healthcare Conference 2025: What are the key talking points likely to be?
How AI Has And Will Continue To Transform Healthcare
AI Translates Nature into New Medicines | StartUp Health Insights: Week of Nov 26, 2024
Building AI trust: The key role of explainability

Share This Article