VentureBeat December 1, 2025
Louis Columbus

One malicious prompt gets blocked, while ten prompts get through. That gap defines the difference between passing benchmarks and withstanding real-world attacks — and it’s a gap most enterprises don’t know exists.

When attackers send a single malicious request, open-weight AI models hold the line well, blocking attacks 87% of the time (on average). But when those same attackers send multiple prompts across a conversation via probing, reframing and escalating across numerous exchanges, the math inverts fast. Attack success rates climb from 13% to 92%.

For CISOs evaluating open-weight models for enterprise deployment, the implications are immediate: The models powering your customer-facing chatbots, internal copilots and autonomous agents may pass single-turn safety benchmarks while failing catastrophically under sustained adversarial pressure.

...

Today's Sponsors

Venturous
ZeOmega

Today's Sponsor

Venturous

 
Topics: AI (Artificial Intelligence), Cybersecurity, Technology
Infographic: ECRI’s Top 10 Tech Hazards of 2026
Doctors Increasingly See AI Scribes in a Positive Light. But Hiccups Persist.
The Download: OpenAI’s plans for science, and chatbot age verification
AI Personas Of Synthetic Clients Spurs Systematic Uplift Of Mental Health Therapeutic Skills
Models that improve on their own are AI's next big thing

Share Article