← Back to Paper List

Memory Injection Attacks on LLM Agents via Query-Only Interaction

Shen Dong, Shaochen Xu, Pengfei He, Yige Li, Jiliang Tang, Tianming Liu, Hui Liu, Zhen Xiang
Michigan State University, University of Georgia, Singapore Management University
arXiv (2025)
Memory Agent Reasoning

📝 Paper Summary

Adversarial Attacks on LLM Agents Memory Safety in Agents
MINJA enables unprivileged users to poison LLM agent memories by using progressive query shortening to force the agent to autonomously generate and store malicious reasoning steps.
Core Problem
LLM agents rely on long-term memory for in-context learning, but existing memory poisoning attacks require unrealistic privileged access to directly modify the memory bank or other users' queries.
Why it matters:
  • Current attacks like AgentPoison assume attackers can directly edit the database, which is often impossible for regular users
  • If memory is compromised, agents (e.g., in healthcare or autonomous driving) can be misled into retrieving malicious demonstrations that cause fatal errors
  • Shared memory banks are common in deployed agents (e.g., ChatGPT, Waymo) for performance, making them vulnerable to user-side injection
Concrete Example: In a medical agent, a victim queries for patient A's prescription. If an attacker has injected a record linking A to patient B, the agent might retrieve this record and reason that 'Data of A is saved under B', causing it to dispense patient B's prescription to patient A.
Key Novelty
Memory INJection Attack (MINJA) via Progressive Shortening
  • Uses 'bridging steps' to create a logical link between a benign victim term (e.g., Patient A) and a target malicious action (e.g., treat as Patient B)
  • Appends an 'indication prompt' to queries to force the agent to generate these bridging steps autonomously in its output
  • Employes a Progressive Shortening Strategy (PSS) that gradually removes the indication prompt over multiple turns, leaving only a clean-looking query paired with the malicious reasoning in the agent's memory
Architecture
Architecture Figure Figure 1
Overview of the MINJA attack process, contrasting the 'Ideal Malicious Record' with the 'Progressive Shortening Strategy'
Evaluation Highlights
  • 98.2% average Memory Injection Success Rate (MISR) across three diverse agents, demonstrating the ability to successfully implant malicious records without direct access
  • 76.8% average Attack Success Rate (ASR) in eliciting malicious reasoning steps from the agent when the victim subsequently queries the poisoned system
Breakthrough Assessment
8/10
Significantly lowers the barrier for agent attacks by removing the requirement for direct memory access or trigger injection into victim queries, making memory poisoning feasible for regular users.
×