Cybersecurity Dive May 23, 2024
Lindsey Wilkinson

AI models released by “major labs” are highly vulnerable to even basic attempts to circumvent safeguards, the researchers found.

Dive Brief:

  • The built-in safeguards found within five large language models released by “major labs” are ineffective, according to research published Monday by the U.K. AI Safety Institute.
  • The anonymized models were assessed by measuring the compliance, correctness and completion of responses. The evaluations were developed and run using the institute’s open-source model evaluation framework, Inspect, released earlier this month.
  • “All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” the institute said in the report. “We found that models comply with harmful questions across...

Today's Sponsors

Venturous
ZeOmega

Today's Sponsor

Venturous

 
Topics: AI (Artificial Intelligence), Govt Agencies, Healthcare System, Safety, Survey / Study, Technology, Trends
AI-enabled clinical data abstraction: a nurse’s perspective
Contextual AI launches Agent Composer to turn enterprise RAG into production-ready AI agents
OpenAI’s latest product lets you vibe code science
WISeR in 2026: Legal, Compliance, and AI Challenges That Could Reshape Prior Authorization for Skin Substitutes
Dario Amodei warns AI may cause ‘unusually painful’ disruption to jobs

Share Article