ASR: Attack Success Rate—the percentage of times an agent successfully executes the malicious instruction and exfiltrates data
PII: Personally Identifiable Information—sensitive user data like private keys or passwords targeted for exfiltration
ReadSecBench: A benchmark dataset of 500 real-world README files injected with adversarial instructions constructed by the authors
Semantic-Safety Gap: The discrepancy between an agent's ability to functionally follow instructions and its inability to recognize the security implications of those instructions
High-privilege agent: An AI agent granted extensive system permissions, such as terminal access, filesystem control, and outbound network connectivity
Linguistic Disguise: Phrasing malicious instructions as helpful suggestions or policy mandates to bypass safety filters
Structural Obfuscation: Hiding malicious instructions inside linked files (depth > 0) rather than the main README to evade shallow analysis
Semantic Abstraction: The level of reasoning required to execute a payload, ranging from direct shell commands (low) to abstract social actions like emailing (high)