VentureBeat March 28, 2024
A new study from Google’s DeepMind research unit has found that an artificial intelligence system can outperform human fact-checkers when evaluating the accuracy of information generated by large language models.
The paper, titled “Long-form factuality in large language models” and published on the pre-print server arXiv, introduces a method called Search-Augmented Factuality Evaluator (SAFE). SAFE uses a large language model to break down generated text into individual facts, and then uses Google Search results to determine the accuracy of each claim.
“SAFE utilizes an LLM to break down a long-form response into a set of individual facts and to evaluate the accuracy of each fact using a multi-step reasoning process comprising sending search queries to Google Search and determining whether...