VentureBeat May 7, 2023
Large language models (LLMs) are one of the hottest innovations today. With companies like OpenAI and Microsoft working on releasing new impressive NLP systems, no one can deny the importance of having access to large amounts of quality data that can’t be undermined.
However, according to recent research done by Epoch, we might soon need more data for training AI models. The team has investigated the amount of high-quality data available on the internet. (“High quality” indicated resources like Wikipedia, as opposed to low-quality data, such as social media posts.)
The analysis shows that high-quality data will be exhausted soon, likely before 2026. While the sources for low-quality data will be exhausted only decades later, it’s clear that the...