News-Medical.Net January 2, 2025
Artificial intelligence tools such as ChatGPT have been touted for their promise to alleviate clinician workload by triaging patients, taking medical histories and even providing preliminary diagnoses.
These tools, known as large-language models, are already being used by patients to make sense of their symptoms and medical tests results.
But while these AI models perform impressively on standardized medical tests, how well do they fare in situations that more closely mimic the real world?
Not that great, according to the findings of a new study led by researchers at Harvard Medical School and Stanford University.
For their analysis, published Jan. 2 in Nature Medicine, the researchers designed an evaluation framework -; or a test -; called CRAFT-MD (Conversational Reasoning Assessment...