VentureBeat October 10, 2024
Ben Dickson

Large language models (LLMs) with very long context windows have been making headlines lately. The ability to cram hundreds of thousands or even millions of tokens into a single prompt unlocks many possibilities for developers.

But how well do these long-context LLMs really understand and utilize the vast amounts of information they receive?

Researchers at Google DeepMind have introduced Michelangelo, a new benchmark designed to evaluate the long-context reasoning capabilities of LLMs. Their findings, published in a new research paper, show that while current frontier models have progressed in retrieving information from large in-context data, they still struggle with tasks that require reasoning over the data structure.

The need for better long-context benchmarks

The emergence of LLMs with extremely...

Today's Sponsors

LEK
ZeOmega

Today's Sponsor

LEK

 
Topics: AI (Artificial Intelligence), Technology
Microsoft’s AI Healthcare Push Aims to Boost Efficiency, Enhance Patient Care
Arm touts growing ecosystem of sustainable AI datacenter silicon
ServiceNow BrandVoice: 5 Traits Of AI Pacesetters To Help You Pull Ahead
Revolutionizing Pharma: The Power of AI and Chatbots in Clinical Trials and Beyond
Driving Real Business Value With Generative AI For SMBs And Beyond

Share This Article