AXIOS April 16, 2024
AI makers can’t agree on how to test whether their models behave responsibly, per Stanford’s latest AI Index, released Monday.
Why it matters: Businesses and individual users have little basis for comparison when choosing an AI provider to suit their needs and values.
Catch-up quick: “AI models behave very differently for different purposes,” Nestor Maslej, editor of the 2024 AI Index from Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI), told Axios.
- But users lack simple options for comparing them, and there’s no solution in sight.
- The most commonly used benchmark test for responsibility — TruthfulQA — is used by only three out of the five leading AI developers assessed by the Stanford team: OpenAI’s GPT-4, Meta’s Llama 2...