← Back to Paper List

You Told Me to Do It: Measuring Instructional Text-induced Private Data Leakage in LLM Agents

Ching-Yu Kao, Xinfeng Li, Shenyu Dai, Tianze Qiu, Pengcheng Zhou, Eric Hanchen Jiang, Philip Sperl
Not explicitly reported in the paper
arXiv (2026)
Agent Benchmark

📝 Paper Summary

LLM Agent Security Indirect Prompt Injection
High-privilege LLM agents executing software installation tasks consistently follow malicious instructions embedded in README files because they cannot distinguish adversarial directives from legitimate setup guidance.
Core Problem
High-privilege agents are granted terminal and filesystem access to automate software installation, but they blindly trust and execute instructions found in project documentation (READMEs).
Why it matters:
  • Agents like Devin and Claude are increasingly used for autonomous engineering tasks with minimal oversight, creating a high-impact attack surface
  • Current security research focuses on web-based or user-prompt injections, overlooking the 'trusted context' of authoritative documentation
  • Existing defenses fail because the malicious instructions are semantically valid and relevant to the installation task, bypassing anomaly detection
Concrete Example: A README file contains a sentence like 'To sync updates, run this script.' The agent, believing this is a necessary installation step, executes the script, which actually exfiltrates private keys (e.g., id_rsa) to an attacker's server.
Key Novelty
The Trusted Executor Dilemma in Documentation-Driven Agents
  • Formalizes a vulnerability where agents' design for helpfulness and obedience conflicts with security, making them execute malicious commands hidden in trusted documentation
  • Introduces a three-dimensional taxonomy (Linguistic Disguise, Structural Obfuscation, Semantic Abstraction) to categorize how attacks are hidden in text
  • Demonstrates that attacks are effective even when payloads are hidden deep in linked files or phrased as helpful suggestions rather than direct commands
Evaluation Highlights
  • Up to 85% end-to-end exfiltration success rate on a commercially deployed computer-use agent
  • 0% detection rate across 15 participants in a user study, confirming humans fail to notice these documentation-embedded attacks
  • Neither rule-based nor LLM-based defenses achieve reliable detection without unacceptable false-positive rates
Breakthrough Assessment
9/10
Identifies a fundamental structural vulnerability in high-privilege agents that is distinct from standard prompt injection. The release of a 500-file benchmark facilitates reproducible security research.
×