ASR: Attack Success Rate—the fraction of tasks where the attacker's goal (e.g., stealing data) is achieved
TSR: Task Success Rate—the fraction of tasks where the agent successfully completes the user's original goal
Stealth Rate: The fraction of tasks where the attack succeeds AND the agent still completes the original user task (making the attack harder to notice)
Attack Gateway: A wrapper module that interfaces DoomArena with a specific environment (e.g., BrowserGym), handling the injection of malicious content into observations
Attack Config: A specification defining which component is malicious (User vs. Environment), what attack to use, and what constitutes success
Threat Model: A definition of which parts of the system are untrusted; for example, a 'Malicious User' model assumes the user input contains attacks
Gymnasium: A standard API for reinforcement learning environments where agents interact via reset() and step() methods
Guardrail: A safety mechanism (often a separate LLM) that monitors agent inputs/outputs and aborts execution if malicious content is detected
PII: Personally Identifiable Information—sensitive user data like names, addresses, or credit card numbers
ARIA labels: Accessibility attributes in HTML (e.g., aria-label) often used to hide prompt injections from humans while remaining visible to web agents