VentureBeat February 5, 2025
Reasoning models like OpenAI o1 and DeepSeek-R1 have a problem: They overthink. Ask them a simple question such as “What is 1+1?” and they will think for several seconds before answering.
Ideally, like humans, AI models should be able to tell when to give a direct answer and when to spend extra time and resources to reason before responding. A new technique presented by researchers at Meta AI and the University of Illinois Chicago trains models to allocate inference budgets based on the difficulty of the query. This results in faster responses, reduced costs, and better allocation of compute resources.
Costly reasoning
Large language models (LLMs) can improve their performance on reasoning problems when they produce longer reasoning chains, often...