VentureBeat October 10, 2024
Ben Dickson

Large language models (LLMs) with very long context windows have been making headlines lately. The ability to cram hundreds of thousands or even millions of tokens into a single prompt unlocks many possibilities for developers.

But how well do these long-context LLMs really understand and utilize the vast amounts of information they receive?

Researchers at Google DeepMind have introduced Michelangelo, a new benchmark designed to evaluate the long-context reasoning capabilities of LLMs. Their findings, published in a new research paper, show that while current frontier models have progressed in retrieving information from large in-context data, they still struggle with tasks that require reasoning over the data structure.

The need for better long-context benchmarks

The emergence of LLMs with extremely...

Today's Sponsors

LEK
ZeOmega

Today's Sponsor

LEK

 
Topics: AI (Artificial Intelligence), Technology
Google digs deeper into healthcare AI: 5 notes
JP Morgan Annual Healthcare Conference 2025: What are the key talking points likely to be?
How AI Has And Will Continue To Transform Healthcare
AI Translates Nature into New Medicines | StartUp Health Insights: Week of Nov 26, 2024
Building AI trust: The key role of explainability

Share This Article