Cybersecurity Dive May 23, 2024
AI models released by “major labs” are highly vulnerable to even basic attempts to circumvent safeguards, the researchers found.
Dive Brief:
- The built-in safeguards found within five large language models released by “major labs” are ineffective, according to research published Monday by the U.K. AI Safety Institute.
- The anonymized models were assessed by measuring the compliance, correctness and completion of responses. The evaluations were developed and run using the institute’s open-source model evaluation framework, Inspect, released earlier this month.
- “All tested LLMs remain highly vulnerable to basic jailbreaks, and some will provide harmful outputs even without dedicated attempts to circumvent their safeguards,” the institute said in the report. “We found that models comply with harmful questions across...