Forbes March 13, 2025
Olga Megorskaya

More than two years after the release of ChatGPT, large language models (LLMs) are now becoming the foundation for agentic AI—autonomous systems that interact with tools in their environment to complete multi-step tasks for the user.

While LLMs like OpenAI’s GPT-4 and Meta’s Llama-3, as well as newer reasoning models such as o1 and DeepSeek-R1, are pushing the boundaries of what these systems can achieve, they still face significant challenges in handling specialized areas of knowledge. A recent study by the University of Massachusetts Amherst analyzed medical summaries generated by leading LLMs, including OpenAI’s GPT-4 and Meta’s Llama-3. The study identified widespread issues in nearly every response, such as inconsistencies in medical events, flawed reasoning and chronological errors.

These challenges...

Today's Sponsors

Venturous
Got healthcare questions? Just ask Transcarent

Today's Sponsor

Venturous

 
Topics: AI (Artificial Intelligence), Technology
New technique helps LLMs rein in CoT lengths, optimizing reasoning without exploding compute costs
Generative AI and deepfakes are fueling health misinformation. Here's what to look out for so you don't get scammed
Using machine learning tools to detect gene mutations from leukemia cell images
Beyond Documentation: Building Platform Moats in Healthcare AI
OpenAI calls for US to centralize AI regulation

Share This Article