HIT Consultant July 22, 2024
What You Should Know:
– Kahun, a company specializing in evidence-based clinical AI, has released a new study comparing the medical capabilities of popular large language models (LLMs) to human experts.
– The findings reveal the limitations of current LLMs in providing reliable information for clinical decision-making.
The Study: Comparing LLMs to Medical Professionals
- LLMs Tested: OpenAI’s GPT-4 and Anthropic’s Claude3-Opus
- Evaluation Method:
- 105,000 evidence-based medical questions and answers (Q&As) were developed by Kahun based on real-world physician queries.
- Q&As covered various medical disciplines and were categorized into numerical (e.g., disease prevalence) and semantic (e.g., differentiating dementia subtypes).
- Six medical professionals answered a subset of Q&As for comparison.
- Key Findings:
- Both LLMs performed...