VentureBeat February 18, 2025
Large language models (LLMs) may have changed software development, but enterprises will need to think twice about entirely replacing human software engineers with LLMs, despite OpenAI CEO Sam Altman’s claim that models can replace “low-level” engineers.
In a new paper, OpenAI researchers detail how they developed an LLM benchmark called SWE-Lancer to test how much foundation models can earn from real-life freelance software engineering tasks. The test found that, while the models can solve bugs, they can’t see why the bug exists and continue to make more mistakes.
The researchers tasked three LLMs — OpenAI’s GPT-4o and o1 and Anthropic’s Claude-3.5 Sonnet — with 1,488 freelance software engineer tasks from the freelance platform Upwork amounting to $1 million in...