MIT Technology Review April 25, 2024
In many high-stakes situations, large language models are not worth the risk. Knowing which outputs to throw out might fix that.
Large language models are famous for their ability to make things up—in fact, it’s what they’re best at. But their inability to tell fact from fiction has left many businesses wondering if using them is worth the risk.
A new tool created by Cleanlab, an AI startup spun out of a quantum computing lab at MIT, is designed to give high-stakes users a clearer sense of how trustworthy these models really are. Called the Trustworthy Language Model, it gives any output generated by a large language model a score between 0 and 1, according to its reliability. This lets...