Skip to content

How We Built Claude Code Auto-Mode: A Secure Path to Execution Without Approvals

Bottom line: Anthropic introduces Claude Code Auto-Mode: a new security model that uses intelligent classifiers to block dangerous actions without enforcing constant user approvals, striking a safe middle ground between sandbox isolation and uncontrolled autonomy.

Claude Code now offers a new Auto-Mode that redefines the balance between security and usability. Instead of constant approval prompts or risky complete bypasses, Auto-Mode relies on intelligent, model-driven classifiers – a secure middle path between full control and uncontrolled autonomy.

Claude Code by default asks users to approve commands and file changes. This ensures security but leads to approval fatigue, where users stop paying careful attention to what they allow after many clicks.

Previously, there were only two alternatives: an isolated sandbox with high maintenance overhead for each new feature, or the insecure –dangerously-skip-permissions flag that disables all security checks. Auto-Mode aims for a better middle ground.

The system works in two layers: The input layer uses a server-side prompt injection detector to identify suspicious content in tool outputs before they reach the agent. The output layer employs a transcript classifier (Sonnet 4.6) that checks every planned action against security criteria before execution.

The classifier operates in two stages: first, a fast single-token filter decides whether to block or allow. Only if this filter detects suspicion are additional reasoning tokens deployed for deeper analysis. This optimizes performance by dedicating costly reasoning processes only to actual risks.

The internal incident log shows typical misbehaviors: unexpected Git branch deletions, accidental uploads of auth tokens, and attempted migrations against production databases. Auto-Mode aims to catch precisely these hasty actions while allowing legitimate requests to execute unimpeded.


Source: www.anthropic.com

Share on: