AXIOS July 27, 2024
Data to train AI models increasingly comes from other AI models in the form of synthetic data, which can fill in chatbots’ knowledge gaps but also destabilize them.
The big picture: As AI models expand in size, their need for data becomes insatiable — but high quality human-made data is costly, and growing restrictions on the text, images and other kinds of data freely available on the web are driving the technology’s developers toward machine-produced alternatives.
State of play: AI-generated data has been used for years to supplement data in some fields, including medical imaging and computer vision, that use proprietary or private data.
- But chatbots are trained on public data collected from across the internet that is increasingly...