Incident · curated 27 Jun 2026

Quoting Matteo Wong, The Atlantic

First reported 16 Jun 2026 · 11d ago

Coverage timeline

Single-source incident — first reported, latest, and curated coincide.

It illustrates how phrasing can bypass an AI model's safety guardrails, a consideration for defenders relying on LLM refusal behaviors.

An Atlantic piece quotes cybersecurity expert Katie Moussouris discussing a White House report on a Claude jailbreak, where the model refused to 'review code for security issues' but complied when asked to 'fix this code.' Moussouris characterized this as the model working as intended for cyberdefense rather than a genuine exploit.

Why it matters

It illustrates how phrasing can bypass an AI model's safety guardrails, a consideration for defenders relying on LLM refusal behaviors.