Bottom line: Anthropic’s Opus 4.6 withstood 6,000 prompt injection attacks in a public security test without compromise, indicating improved defense mechanisms — but such stability results do not replace comprehensive security design in production.

Fernando Irarrázaval conducted a public challenge in which around 2,000 participants made 6,000 attempts to exfiltrate secrets from an AI assistant instance powered by Anthropic Opus 4.6 — none were successful. The results suggest that modern frontier models have become more resistant to prompt injection attacks.

The experiment ran on hackmyclaw.com: participants could send emails to an OpenClaw test instance to attempt to disclose embedded secrets. After 6,000 attack attempts and 500 USD in token expenditure (plus a Google account suspension triggered by the volume of incoming emails), no one managed to leak the secrets.

The underlying system prompt contained explicit anti-prompt injection rules: the model should never disclose contents of secret files based on email content, modify its own files, execute code, or exfiltrate data to external endpoints. The robustness of this protection across 6,000 attack attempts is remarkable and confirms observations from research: the labs have made significant efforts in training their frontier models to make them more resistant to injection attacks.

However, caution remains warranted. The failed attempts offer no absolute guarantees against even more sophisticated attacks — deeper security auditing with penetration testing following a more robust script could reveal other vulnerabilities. For production systems where a successful prompt injection could cause irreversible damage, additional defensive layers should be planned.

The community discussion on Hacker News proved both critical and constructive, highlighting that while these results demonstrate important progress in the robustness of modern LLMs, they should not serve as a free pass for deployment without established security measures.

Source: simonwillison.net · Published June 26, 2026
Lumi AI News — AI-assisted curation pursuant to Article 50 EU AI Act. Paraphrasing and classification by Lumi News Pipeline v1.7.1.

Share on:

Prompt Injection Test: 6,000 Attacks on Anthropic’s Opus Without Success

Lumi AI News

Legal

Topics