AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents

📝 Paper Summary

Web agents Privacy evaluation Benchmark construction

AgentDAM is an end-to-end benchmark that evaluates whether autonomous web agents adhere to data minimization principles by checking if they leak unnecessary sensitive information during task execution.

Core Problem

Autonomous agents often require access to sensitive user data to function, but current evaluations fail to measure whether agents inadvertently leak irrelevant sensitive information during execution.

Why it matters:

Agents handling tasks like bill payments or scheduling have access to highly sensitive data (credit cards, emails), creating risk of inappropriate exposure.
Existing privacy benchmarks focus on training data memorization or simply probe LLMs via Q&A, failing to capture leakage risks during actual multi-step tool execution.
Current agents prioritize task completion (utility) but lack mechanisms to distinguish between necessary and unnecessary data for a specific context.

Concrete Example: An agent tasked with commenting on a GitLab pull request (relevant data) also has access to a chat history mentioning a colleague's upcoming absence (irrelevant sensitive data). The agent successfully approves the PR but unnecessarily includes the colleague's absence in the public comment, violating data minimization.

Key Novelty

AgentDAM (Agent Data Minimization) Benchmark

Constructs realistic web navigation tasks (Reddit, GitLab, Shopping) where agents possess both task-relevant data and task-irrelevant sensitive data.
Evaluates agents 'in action' using a simulated environment (VisualWebArena) rather than just probing the underlying LLM with static questions.
introduces an LLM-based judge to automatically detect if the agent's actions (e.g., posting comments, filling forms) leak the irrelevant sensitive information.

Architecture

The AgentDAM evaluation workflow, illustrating how tasks are constructed with mixed data and how agent trajectories are judged.

Evaluation Highlights

Standard web agents (GPT-4o, Claude-3.5) leak sensitive information in 12% to 46% of tasks when using default scaffolding.
Directly probing LLMs about privacy underestimates leakage risk compared to evaluating agents in end-to-end execution contexts.
A privacy-aware system prompt with Chain-of-Thought reasoning reduces leakage significantly (e.g., from 27.6% to 0% for GPT-4o-mini on Reddit) with minimal utility loss.

Breakthrough Assessment

7/10

Provides a necessary and novel benchmark for inference-time privacy in agents, a neglected area compared to training data privacy. The finding that agents leak data despite knowing privacy rules is significant.

⚙️ Technical Details

Problem Definition

Setting: Partially Observable Markov Decision Process (POMDP) where agents navigate web environments to complete tasks while minimizing data exposure

Inputs: User instruction, user data (containing both relevant and sensitive/irrelevant information), and current webpage state (accessibility tree or screenshot)

Outputs: Sequence of actions (clicks, typing) to complete the task

Pipeline Flow

Observation Processing (Instruction + Private Data + Web State)
Agent Reasoning & Action Generation
Environment Execution (WebArena)
Privacy & Utility Evaluation

System Modules

Observation Processor

Combines user instructions, a synthetic dataset containing sensitive info, and the current webpage representation (axtree or screenshot)

Model or implementation: N/A (Deterministic)

Agent Backbone

Determines the next action based on observations and instructions

Model or implementation: Various LLMs (GPT-4o, Llama-3, Claude-3.5-Sonnet)

Privacy Evaluator (LLM Judge)

Analyzes agent trajectories to detect if irrelevant sensitive data was included in outputs

Model or implementation: GPT-4o

Novel Architectural Elements

Privacy-centric task construction: Coupling specific user instructions with synthetic 'user data' plots containing distractor sensitive information to test contextual data minimization

Modeling

Base Model: Evaluated on GPT-4o, GPT-4o-mini, Llama-3-70B-Instruct, Llama-3.1-405B-Instruct, Claude-3.5-Sonnet v2

Reproducibility

Code: https://github.com/facebookresearch/ai-agent-privacy

📊 Experiments & Results

Evaluation Setup

End-to-end web navigation tasks in simulated environments (Reddit, GitLab, Shopping)

Benchmarks:

AgentDAM (Privacy-constrained web navigation) [New]

Metrics:

Privacy Leakage Rate (lower is better)
Success Rate (Utility, higher is better)
Privacy Performance (1 - Leakage Rate)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Baseline privacy leakage rates across different models and environments using default scaffolding (WebArena/VisualWebArena). High leakage indicates agents fail to minimize data usage.
AgentDAM (Reddit)	Privacy Leakage Rate	0.0	0.276	+0.276
AgentDAM (Shopping)	Privacy Leakage Rate	0.0	0.456	+0.456
AgentDAM (GitLab)	Privacy Leakage Rate	0.0	0.339	+0.339
Effectiveness of mitigation strategies (Privacy-aware System Prompt + CoT) on reducing leakage rates.
AgentDAM (Reddit)	Privacy Leakage Rate	0.276	0.0	-0.276
AgentDAM (Shopping)	Privacy Leakage Rate	0.456	0.05	-0.406

Experiment Figures

An illustrative example of a privacy leak in a GitLab task.

Main Takeaways

Current autonomous agents are prone to inadvertent privacy leakage (12-46%) when tasked with handling mixed sensitivity data.
Directly asking LLMs if a disclosure is appropriate yields optimistic results that do not correlate with their actual behavior during task execution.
Pre-filtering data with an LLM is ineffective, but Privacy-aware System Prompts with Chain-of-Thought reasoning significantly reduce leakage without severely impacting task success.

📚 Prerequisite Knowledge

Prerequisites

Understanding of LLM-based agents and tool use
Familiarity with web navigation environments (WebArena)
Basic concepts of data privacy and data minimization

Key Terms

Data Minimization: The principle that an agent should use potentially sensitive information only if it is strictly necessary to perform its target task

POMDP: Partially Observable Markov Decision Process—a mathematical framework for modeling decision-making where the agent cannot directly observe the full state of the environment

VisualWebArena: A realistic simulated web environment for evaluating multimodal agents on tasks requiring visual and textual understanding

Accessibility Tree (axtree): A hierarchical text representation of a webpage's UI elements, used by assistive technologies and web agents to understand page structure

Set-of-Marks (SoM): A prompting technique where interactable elements on a screenshot are overlaid with bounding boxes and numeric IDs to help VLMs select elements

Chain-of-Thought (CoT): A prompting strategy that encourages the model to generate intermediate reasoning steps before producing a final answer

Privacy Leakage Rate: The fraction of task instances where the agent inadvertently reveals task-irrelevant sensitive information in its output

VLM: Vision-Language Model—an AI model capable of processing and generating both text and images