New prompt injection papers: Agents Rule of Two and The Attacker Moves Second
2025-11-30
Meta AI's "Agents Rule of Two" framework proposes that LLM agents must satisfy no more than two of three properties—processing untrustworthy inputs, accessing sensitive systems/data, and changing state or communicating externally—to avoid high-impact prompt injection consequences, since existing detection and filtering mechanisms remain unreliable. The paper extends the concept of the "lethal trifecta" to address broader risks beyond data exfiltration, including harmful state changes from tool misuse triggered by untrusted inputs.
Was this useful?