VentureBeat December 1, 2025
One malicious prompt gets blocked, while ten prompts get through. That gap defines the difference between passing benchmarks and withstanding real-world attacks — and it’s a gap most enterprises don’t know exists.
When attackers send a single malicious request, open-weight AI models hold the line well, blocking attacks 87% of the time (on average). But when those same attackers send multiple prompts across a conversation via probing, reframing and escalating across numerous exchanges, the math inverts fast. Attack success rates climb from 13% to 92%.
For CISOs evaluating open-weight models for enterprise deployment, the implications are immediate: The models powering your customer-facing chatbots, internal copilots and autonomous agents may pass single-turn safety benchmarks while failing catastrophically under sustained adversarial pressure.
...






