Inside Precision Medicine October 15, 2024
Most large language models (LLMs) used in health care settings, such as AI chatbots, are being assessed in a fragmented and inconsistent way that does not encompass real patient information, researchers have warned.
A new study, in the journal JAMA, found that just one in 20 evaluation studies included real patient data that reflected the complexities of clinical practice.
Most instead focused on the accuracy of answering medical examinations, which resulted in limited attention on considerations such as fairness, bias, toxicity and deployment.
The investigators said that testing these AI programs on hypothetical medical questions has previously been likened to certifying a car for road worthiness using a multiple-choice questionnaire.
“Real patient care data encompasses the complexities of clinical practice,...