JAMA Network October 15, 2024
Suhana Bedi, Yutong Liu, Lucy Orr-Ewing, Dev Dash, Sanmi Koyejo, Alison Callahan, Jason A. Fries, Michael Wornow, Akshay Swaminathan, Lisa Soleymani Lehmann, Hyo Jung Hong, Mehr Kashyap, Akash R. Chaurasia, Nirav R. Shah, Karandeep Singh, Troy Tazbaz, Arnold Milstein, Michael A. Pfeffer, Nigam H. Shah

A Systematic Review

Key Points

Question
How are health care applications of large language models (LLMs) currently evaluated?

Findings
In this systematic review of 519 studies published between January 1, 2022, and February 19, 2024, only 5% used real patient care data for LLM evaluation. Administrative tasks such as writing prescriptions and natural language processing and natural language understanding tasks such as summarization were understudied; accuracy was the predominant dimension of evaluation, while fairness, bias, and toxicity assessments were less studied.

Meaning
Results of this systematic review suggest that current evaluations of LLMs in health care are fragmented and insufficient, and that evaluations need to use real patient data, quantify biases, cover a wider range of tasks and specialties, and...

Today's Sponsors

LEK
ZeOmega

Today's Sponsor

LEK

 
Topics: AI (Artificial Intelligence), Provider, Survey / Study, Technology, Trends
Google digs deeper into healthcare AI: 5 notes
JP Morgan Annual Healthcare Conference 2025: What are the key talking points likely to be?
How AI Has And Will Continue To Transform Healthcare
AI Translates Nature into New Medicines | StartUp Health Insights: Week of Nov 26, 2024
Building AI trust: The key role of explainability

Share This Article