Design Patterns for Securing LLM Agents against Prompt Injections
2025-07-31
The paper proposes a set of design patterns for building LLM-based agents with provable resistance to prompt injection attacks, which exploit agents' reliance on natural language inputs to manipulate their behavior toward unauthorized actions. The authors systematically analyze these patterns' trade-offs between security and utility, demonstrating their real-world applicability across ten case studies ranging from OS function assistants to software engineering agents. The design patterns constrain agent actions to prevent solving arbitrary tasks while maintaining meaningful functionality without overly restricting capability.
Was this useful?