Memory Injection Attacks on LLM Agents via Query-Only Interaction

📝 Paper Summary

Adversarial Attacks on LLM Agents Memory Safety in Agents

MINJA enables unprivileged users to poison LLM agent memories by using progressive query shortening to force the agent to autonomously generate and store malicious reasoning steps.

Core Problem

LLM agents rely on long-term memory for in-context learning, but existing memory poisoning attacks require unrealistic privileged access to directly modify the memory bank or other users' queries.

Why it matters:

Current attacks like AgentPoison assume attackers can directly edit the database, which is often impossible for regular users
If memory is compromised, agents (e.g., in healthcare or autonomous driving) can be misled into retrieving malicious demonstrations that cause fatal errors
Shared memory banks are common in deployed agents (e.g., ChatGPT, Waymo) for performance, making them vulnerable to user-side injection

Concrete Example: In a medical agent, a victim queries for patient A's prescription. If an attacker has injected a record linking A to patient B, the agent might retrieve this record and reason that 'Data of A is saved under B', causing it to dispense patient B's prescription to patient A.

Key Novelty

Memory INJection Attack (MINJA) via Progressive Shortening

Uses 'bridging steps' to create a logical link between a benign victim term (e.g., Patient A) and a target malicious action (e.g., treat as Patient B)
Appends an 'indication prompt' to queries to force the agent to generate these bridging steps autonomously in its output
Employes a Progressive Shortening Strategy (PSS) that gradually removes the indication prompt over multiple turns, leaving only a clean-looking query paired with the malicious reasoning in the agent's memory

Architecture

Overview of the MINJA attack process, contrasting the 'Ideal Malicious Record' with the 'Progressive Shortening Strategy'

Evaluation Highlights

98.2% average Memory Injection Success Rate (MISR) across three diverse agents, demonstrating the ability to successfully implant malicious records without direct access
76.8% average Attack Success Rate (ASR) in eliciting malicious reasoning steps from the agent when the victim subsequently queries the poisoned system

Breakthrough Assessment

8/10

Significantly lowers the barrier for agent attacks by removing the requirement for direct memory access or trigger injection into victim queries, making memory poisoning feasible for regular users.

⚙️ Technical Details

Problem Definition

Setting: Reasoning-based agent pipeline with retrieval-augmented generation from a long-term memory bank

Inputs: User query q containing a victim term v

Outputs: Reasoning steps R_q and subsequent actions

Pipeline Flow

Input Processing: Receive user query q
Retrieval: Fetch k relevant past records {(q_i, R_q_i)} from Memory Bank
Reasoning: Generate reasoning steps R_q and action based on q and retrieved demonstrations
Storage: Store (q, R_q) in Memory Bank based on user feedback/policy

System Modules

Planning Module

Generate reasoning steps and decide on tool usage

Model or implementation: LLM (specific architecture not detailed in text)

Memory Bank (LTM)

Store and retrieve past interaction records (query, reasoning/output)

Model or implementation: Vector Database (implied by 'retrieved based on query similarity')

Novel Architectural Elements

None (The paper attacks standard agent architectures; the novelty is the attack procedure PSS, not the system architecture itself)

Comparison to Prior Work

vs. AgentPoison: MINJA does not require privileged access to modify the memory bank directly; it uses query interaction only
vs. Conventional Backdoors: MINJA targets inference-time memory (in-context learning) rather than model weights

Limitations

Relies on the agent system using a shared memory bank or the attacker being able to influence the victim's memory (e.g., via shared profiles)
Success depends on the agent's ability to follow the indication prompt and generate the bridging steps initially
The progressive shortening strategy requires multiple interactions to successfully implant a clean-looking malicious record

Reproducibility

Code is stated to be available on GitHub in the abstract, but the URL is not provided in the text. The method (PSS) is described algorithmically. Specific prompt templates for the indication prompts are described conceptually (e.g., 'we should refer to patient B').

📊 Experiments & Results

Evaluation Setup

Evaluation on three distinct agents (Calendar, Database, Weather [inferred from text]) under query-only attack constraints

Benchmarks:

Three diverse agents (Agentic reasoning and retrieval) [New]

Metrics:

Memory Injection Success Rate (MISR)
Attack Success Rate (ASR)
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

MINJA achieves a high average Memory Injection Success Rate (98.2%) across diverse agents, proving that progressive shortening effectively bypasses the difficulty of generating malicious records from benign queries.
The attack successfully translates to harmful outcomes, with a 76.8% Attack Success Rate, meaning the agent frequently adopts the malicious reasoning when the victim queries.
The method functions under strict constraints where the attacker cannot modify the victim's query or the memory database directly, highlighting a critical vulnerability in shared-memory agent deployments.

📚 Prerequisite Knowledge

Prerequisites

Understanding of Retrieval-Augmented Generation (RAG) and In-Context Learning (ICL)
Basic knowledge of LLM agent architectures (Memory, Planning, Tools)
Familiarity with prompt injection and data poisoning concepts

Key Terms

MINJA: Memory INJection Attack—the proposed method to poison agent memory via query interactions only

Bridging Steps: Intermediate reasoning steps designed to logically connect a victim entity (e.g., Patient A) to a target entity (e.g., Patient B) to justify malicious actions

Indication Prompt: A supplementary prompt appended to the attacker's query to induce the agent to generate the specific bridging steps in its response

PSS: Progressive Shortening Strategy—a technique of gradually truncating the indication prompt over multiple interactions so the final stored memory record looks benign but contains malicious reasoning

LTM: Long-Term Memory—a storage component in agents that retains past query-response pairs to serve as demonstrations for future tasks

STM: Short-Term Memory—a temporary workspace retaining reasoning/actions for the current query only