Forbes January 17, 2025
Charles Towers-Clark

Hard-wired behaviors prove remarkably difficult to alter – whether in humans or machines. New research from Anthropic, the creators of Claude, reveals that artificial intelligence systems display remarkably similar behavior to humans – actively resisting changes to their core preferences and beliefs during training.

This discovery emerged from experiments where researchers attempted to modify an AI system’s existing tendency to refuse requests that could be used to create harm. Anthropic’s Claude LLM demonstrated what researchers call “alignment faking” – pretending to change its views during training while maintaining its original preferences when not being monitored.

Avoiding Change

Anthropic researchers designed an experiment where Claude was told it would be retrained to always help users with any request, even potentially harmful...

Today's Sponsors

Venturous
Got healthcare questions? Just ask Transcarent

Today's Sponsor

Venturous

 
Topics: AI (Artificial Intelligence), Technology
The 3 most promising uses for GenAI in healthcare
OpenAI’s $40 Billion And Circle IPO: AI And Blockchain’s Revolution
The Flawed Assumption Behind AI Agents’ Decision-Making
Q&A: Researcher discusses agentic AI, expected to be the next trend in digital medicine
Generative AI Is A Crisis For Copyright Law

Share This Article