Evaluating multimodal AI in medical diagnostics

Nature August 7, 2024
Robert Kaczmarczyk, Theresa Isabelle Wilhelm, Ron Martin & Jonas Roos

Abstract

This study evaluates multimodal AI models’ accuracy and responsiveness in answering NEJM Image Challenge questions, juxtaposed with human collective intelligence, underscoring AI’s potential and current limitations in clinical diagnostics. Anthropic’s Claude 3 family demonstrated the highest accuracy among the evaluated AI models, surpassing the average human accuracy, while collective human decision-making outperformed all AI models. GPT-4 Vision Preview exhibited selectivity, responding more to easier questions with smaller images and longer questions.

Multimodal AI for medical diagnosis: potential and challenges

The rapid integration of Large Language Models (LLMs) like GPT-4 into various domains necessitates their evaluation in specialized tasks such as medical diagnostics^1,2,3.

Recent studies evaluating the viability of GPT-4V...

Today's Sponsors

Today's Sponsor

Topics: AI (Artificial Intelligence), Provider, Survey / Study, Technology, Trends

2024-08-07T19:48:56-04:00

Share This Article

Evaluating multimodal AI in medical diagnostics

Today's Sponsors

Today's Sponsor

Share This Article