Return to Projects

 DoomArena

Project

AI agents are becoming increasingly powerful and ubiquitous. They now interact with users, tools, web pages, and databases—each of which introduces potential attack vectors for malicious actors. As a result, the security of AI agents has become a critical concern. DoomArena provides a modular, configurable framework that enables the simulation of realistic and evolving security threats against AI agents. It helps researchers and developers explore vulnerabilities, test defenses, and improve the security of AI systems. The DoomArena architecture comprises several key components that work together to create a flexible, powerful security testing environment for AI agents:

  • Attack Gateway: Functions as a wrapper around original agentic environments (TauBench, BrowserGym, OSWorld), injecting malicious content into the user-agent-environment loop as the AI agent interacts with it.
  • Threat Model: Defines which components of the agentic framework are attackable and specifies targets for the attacker, enabling fine-grained security testing.
  • Attack Config: Specifies the AttackableComponent, the AttackChoice (drawn from a library of implemented attacks), and the SuccessFilter which evaluates attack success.

DoomArena offers several advanced capabilities that make it a powerful and flexible framework for security testing of AI agents:

  • Plug-in: Plug into to your favorite agentic framework and environments τ-Bench, BrowserGym, OSWorld without requiring any modifications to their code.
  • Customizable threat models: Test agents against various threat models including malicious users and compromised environments.
  • Generic Attacker Agents: Develop and reuse attacker agents across multiple environments.
  • Defense Evaluation: Compare effectiveness of guardrail-based, LLM-powered, and security-by-design defenses.
  • Composable Attacks: Reuse and combine previously published attacks for comprehensive and fine-grained security testing.
  • Trade-off Analysis: Analyze the utility/security trade-off under various threat models.