AI-driven vulnerability discovery is no longer restricted to proprietary frontier models — smaller open-source models are already finding the same zero-days, so CISOs should assume that attackers will gain access within months.
Grammar-Constrained Decoding (GCD), a technique for ensuring syntactically correct code, opens a new jailbreak method for attackers with a success rate over 30 percentage points higher than previous approaches.
The security filter in Claude 3.5 Sonnet blocks legitimate security requests, limiting its usability for CTOs performing security audits and vulnerability assessments.
Trust in AI does not emerge automatically but must be systematically built through explainability measures depending on the application context and regulatory requirements.
Claude Fable 5 does not permit zero-data-retention contracts and retains all prompts and outputs for 30 days for security purposes, even where organizations have ZDR agreements with older Claude models.
Aligning router rows with the principal singular directions of their associated expert matrices improves the efficiency and stability of Mixture-of-Experts models.
Anthropic calls for an aviation-like regulatory authority or commissioned private auditors to examine AI models for critical risks before their release.
The Claw-SWE-Bench framework demonstrates that adapter design is critical for code agents: with a minimal adapter, OpenClaw achieves 19.1% Pass@1, with a complete adapter 73.4%.