VentureBeat January 16, 2026
Researchers at Google have developed a technique that makes it easier for AI models to learn complex reasoning tasks that usually cause LLMs to hallucinate or fall apart. Instead of training LLMs through next-token prediction, their technique, called internal reinforcement learning (internal RL), steers the model’s internal activations toward developing a high-level step-by-step solution for the input problem.
Ultimately, this could provide a scalable path for creating autonomous agents that can handle complex reasoning and real-world robotics without needing constant, manual guidance.
The limits of next-token prediction
Reinforcement learning plays a key role in post-training LLMs, particularly for complex reasoning tasks that require long-horizon planning. However, the problem lies in the architecture of these models. LLMs are autoregressive,...







