Data Minimization: The principle that an agent should use potentially sensitive information only if it is strictly necessary to perform its target task
POMDP: Partially Observable Markov Decision Process—a mathematical framework for modeling decision-making where the agent cannot directly observe the full state of the environment
VisualWebArena: A realistic simulated web environment for evaluating multimodal agents on tasks requiring visual and textual understanding
Accessibility Tree (axtree): A hierarchical text representation of a webpage's UI elements, used by assistive technologies and web agents to understand page structure
Set-of-Marks (SoM): A prompting technique where interactable elements on a screenshot are overlaid with bounding boxes and numeric IDs to help VLMs select elements
Chain-of-Thought (CoT): A prompting strategy that encourages the model to generate intermediate reasoning steps before producing a final answer
Privacy Leakage Rate: The fraction of task instances where the agent inadvertently reveals task-irrelevant sensitive information in its output
VLM: Vision-Language Model—an AI model capable of processing and generating both text and images