Key point: AI security requires fundamental differences from traditional cybersecurity: prompt injection creates a new exploit class for agents, specialized red-teaming models outperform humans at uncovering weaknesses, and larger models are not automatically more robust.
US export controls on the Mythos model have placed prompt injection attacks and jailbreak techniques at the center of the AI security debate. Zico Kolter and Matt Fredrikson from Gray Swan explain why traditional cybersecurity approaches are insufficient for AI systems.
The US government has brought the Mythos model into focus through export control directives. In parallel, prompt injection attacks and indirect prompt injections are emerging as critical security risks that have been underestimated to date. Zico Kolter, member of OpenAI’s Board of Directors in the Safety & Security Committee, and Matt Fredrikson, CMU professor and CEO of Gray Swan, have established themselves as subject matter experts through foundational work on indirect prompt injections and were directly consulted in evaluating the Mythos model.
AI security differs structurally from classical cybersecurity. Agents and Large Language Models represent a distinct vulnerability class: they can be compromised through prompt injections – a weakness that does not exist in traditional software. Shade, Anthropic’s tool for adversarial evaluation of robustness against prompt injection attacks in coding environments, examines precisely these cases. Gray Swan’s toolkit also includes Cygnal, a guardrails product, and the Gray Swan Arena – one of the world’s largest community platforms for red teaming.
Specialized red-teaming models can now outperform humans at systematically breaking AI systems. The finding here is: larger models are not automatically more robust. This supports the thesis that the next major AI incidents could emerge as “gray swan events” – events that are foreseeable but systematically overlooked. The lethal trifecta of AI security consists of untrusted input, sensitive private data, and exfiltration-prone systems.
Enterprise deployments require a fundamental rethink. Agent-native identities, permission models, and guardrail-based controls must be redesigned from the ground up – “better prompting” is not enough. Gray Swan posits that AI security will in future need to be integrated into insurance and compliance stacks to address the growing number of prompt injection breaches.
Source: www.latent.space · Published June 22, 2026
Lumi AI News — AI-assisted curation pursuant to Article 50 EU AI Act. Paraphrase and classification by Lumi News Pipeline v1.7.1.