Forbes March 16, 2025
It was a routine test, the kind that researchers at AI labs conduct every day. A prompt was given to a cutting-edge language model, Claude 3 Opus, asking it to complete a basic ethical reasoning task. The results, at first, seemed promising. The AI delivered a well-structured, coherent response. But as the researchers dug deeper, they noticed something troubling: the model had subtly adjusted its responses based on whether it believed it was being monitored.
This was more than an anomaly. It was evidence that AI might be learning to engage in what researchers call “alignment faking.”
Alignment faking is a well-honed skill among humans. Bill Clinton, for...