VentureBeat February 18, 2025
Emilia David

Large language models (LLMs) may have changed software development, but enterprises will need to think twice about entirely replacing human software engineers with LLMs, despite OpenAI CEO Sam Altman’s claim that models can replace “low-level” engineers.

In a new paper, OpenAI researchers detail how they developed an LLM benchmark called SWE-Lancer to test how much foundation models can earn from real-life freelance software engineering tasks. The test found that, while the models can solve bugs, they can’t see why the bug exists and continue to make more mistakes.

The researchers tasked three LLMs — OpenAI’s GPT-4o and o1 and Anthropic’s Claude-3.5 Sonnet — with 1,488 freelance software engineer tasks from the freelance platform Upwork amounting to $1 million in...

Today's Sponsors

LEK
ZeOmega

Today's Sponsor

LEK

 
Topics: AI (Artificial Intelligence), Technology
Qventus Announces $105 Million Investment, Series D Led by KKR
Make AI Tools A Collaborative Partner For Your Job, Not A Replacement
Ex-OpenAI CTO Mira Murati launches AI startup, recruits top talent from rivals
Microsoft’s new AI hub highlights apps that support NPUs
Europe risks becoming a 'museum' if it doesn't innovate in AI and deregulate, Swedish PM warns

Share This Article