VentureBeat March 13, 2025
Ben Dickson

Reasoning through chain-of-thought (CoT) — the process by which models break problems into manageable “thoughts” before deducting answers — has become an integral part of the latest generation of frontier large language models (LLMs).

However, the inference costs of reasoning models can quickly stack up as models generate excess CoT tokens. In a new paper, researchers at Carnegie Mellon University propose an LLM training technique that gives developers more control over the length of the CoT.

Called length controlled policy optimization (LCPO), the technique conditions the model to provide correct answers while also keeping its “thoughts” within a predetermined token budget. Experiments show that models trained on LCPO provide a smooth tradeoff between accuracy and costs and can surprisingly outperform...

Today's Sponsors

Venturous
Got healthcare questions? Just ask Transcarent

Today's Sponsor

Venturous

 
Topics: AI (Artificial Intelligence), Technology
Generative AI and deepfakes are fueling health misinformation. Here's what to look out for so you don't get scammed
Using machine learning tools to detect gene mutations from leukemia cell images
Beyond Documentation: Building Platform Moats in Healthcare AI
OpenAI calls for US to centralize AI regulation
How a Chicago hospital plans to save $3M-$4M with AI

Share This Article