Cyber-related safeguards on Claude Code explained — false positives, session poisoning, and alternatives for security automation.
Cyber-related safeguards is the softer wording Anthropic uses when the cyber classifier fires before Claude answers. The session stops, you get a CVP link, and your agent run is dead.
Developers report this on Cloudflare hardening, npm audit, kernel debugging, and even fiction worldbuilding with military jargon.
Safeguards vs model refusal
Safeguards run outside the main model. They can block tool continuations — after Playwright or Read tool output — not just your last message.
That makes automation brittle: the trigger may be ten turns ago in accumulated context.
Recovery steps
Esc + Esc or /rewind to before the block, /clear for a fresh session, avoid --continue on poisoned transcripts.
Split security tasks across sessions: RLS review in one, dependency audit in another.
When to leave the stack
If your job is security automation or API-backed copilots, classifier variance is a product risk. Uncensored APIs with predictable completions are the practical fix.
Try uncensored AI free
No filters, no lecture, no training on your chats. Start in under a minute.
Icelake gives you uncensored inference with no chat storage — useful when Claude's safeguards keep firing on your audit vocabulary.
FAQ
Why does /compact trigger cyber safeguards?
Compaction summarizes the whole session. If it contained security terms, the summary itself can trip the classifier.