Memory in the Age of AI Agents

📝 Paper Summary

Agent memory architecture Long-term memory for LLMs Agentic RAG

The survey unifies fragmented research on agent memory into a single taxonomy—Forms (structure), Functions (utility), and Dynamics (evolution)—distinguishing it from RAG and context engineering while identifying future frontiers.

Core Problem

Research on agent memory is fragmented with inconsistent terminology, making it difficult to distinguish true agentic memory from related concepts like RAG or simple context window management.

Why it matters:

Current definitions conflate 'LLM memory' (context caching) with 'Agent memory' (persistent, evolving cognitive state), hindering clarity.
Traditional 'long-term/short-term' taxonomies fail to capture the complexity of modern agents that need to evolve, forget, and consolidate experience over long horizons.
Developers lack a unified framework to design agents that can maintain identity and learn skills across varying tasks without starting from scratch.

Concrete Example: Early systems like MemoryBank framed their contributions as 'LLM memory', but they were actually addressing agentic challenges like tracking user preferences across days. Without a clear taxonomy, a researcher might confuse architectural caching (like Mamba) with cognitive memory (like a user profile), leading to misaligned system designs.

Key Novelty

Unified Forms-Functions-Dynamics Taxonomy

Forms: Classifies memory into Token-level (discrete text/visual units), Parametric (encoded in model weights), and Latent (hidden states/activations).
Functions: Distinguishes Factual (knowledge), Experiential (skills/history), and Working memory (current task workspace) rather than just temporal duration.
Dynamics: Models memory not as static storage but as a lifecycle of Formation (creation), Evolution (consolidation/forgetting), and Retrieval (access).

Evaluation Highlights

No quantitative evaluation results reported in the paper (this is a survey paper).
Compiles a list of key benchmarks including LoCoMo, LongMemEval, and GAIA.
Summarizes open-source frameworks like Memary, MemOS, and Mem0.

Breakthrough Assessment

9/10

Provides a crucial, clarifying taxonomy for a rapidly saturating field. By rigorously distinguishing Agent Memory from RAG and Context Engineering, it sets the standard for future definitions.

⚙️ Technical Details

Problem Definition

Setting: LLM-based agent systems interacting with environments over time, requiring persistence beyond a single context window.

Inputs: Observation history, environmental feedback, user instructions.

Outputs: Actions (tool use, planning, text) conditioned on retrieved memory states.

Pipeline Flow

Environment Observation (Agent perceives state)
Memory Retrieval (Selects relevant context from storage)
Reasoning/Planning (LLM generates action)
Action Execution (Agent acts on environment)
Memory Formation/Evolution (New experience is consolidated/updated in storage)

System Modules

Memory Formation (Dynamics)

Selectively transforms raw artifacts (tool outputs, reasoning traces) into memory candidates

Model or implementation: Generic function F(M_t, phi_t)

Memory Evolution (Dynamics)

Integrates candidates into the base, handling consolidation, conflict resolution, and forgetting

Model or implementation: Generic function E(M_form)

Memory Retrieval (Dynamics)

Constructs queries to access relevant memory for the current decision

Model or implementation: Generic function R(M_t, o_t, Q)

Novel Architectural Elements

Explicit distinction of 'Latent Memory' as a form (e.g., MemoryLLM, MemGen) distinct from standard external databases or fixed parameters.
Separation of 'Experiential Memory' (procedural/episodic skills) from 'Factual Memory' (semantic knowledge) in the functional taxonomy.

Comparison to Prior Work

vs. LLM Memory: Agent memory focuses on cognitive modeling and cross-task persistence, whereas LLM memory focuses on architectural capacity (context length/cache).
vs. RAG: Agent memory is self-evolving and internal to the agent's identity, whereas RAG typically accesses static external databases for single tasks.
vs. Context Engineering: Agent memory maintains a persistent state beyond the context window, whereas context engineering manages the immediate resource constraints of the window.

Limitations

Survey scope necessitates high-level abstraction; individual system nuances may be lost in the taxonomy.
The boundary between 'Agentic RAG' and 'Agent Memory' remains blurry in practice despite conceptual distinctions.
Rapid evolution of the field means the list of frameworks and benchmarks may quickly become dated.

Reproducibility

Code: https://github.com/Shichun-Liu/Agent-Memory-Paper-List

This is a survey paper. The authors provide a GitHub repository (https://github.com/Shichun-Liu/Agent-Memory-Paper-List) containing the list of papers discussed. No specific model training or code to reproduce.

📊 Experiments & Results

Evaluation Setup

Survey paper—qualitative analysis of existing literature.

Benchmarks:

LoCoMo (Long-context dialogue evaluation)
LongMemEval (Long-term memory evaluation)
GAIA (Complex problem-solving and deep research)

Metrics:

None reported (Survey)
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

Agent memory is distinct from LLM memory and RAG; it requires establishing a persistent, evolving cognitive state.
Memory forms are diversifying beyond text logs (Token-level) to include Parametric (weight updates) and Latent (hidden state) forms.
Future frontiers include automated memory management (removing manual design), integration with Reinforcement Learning (internalizing memory policies), and shared memory in multi-agent systems.

📚 Prerequisite Knowledge

Prerequisites

Understanding of LLM-based agents (e.g., ReAct, reflection)
Familiarity with RAG (Retrieval-Augmented Generation) architectures
Basic knowledge of context window constraints in Transformers

Key Terms

Agent Memory: A persistent, self-evolving cognitive state that integrates factual knowledge and experience across tasks, distinct from static RAG or transient context buffers.

Parametric Memory: Memory stored directly within the model's weights, often updated via fine-tuning or editing, accessed implicitly during forward passes.

Latent Memory: Memory represented as continuous hidden states or activations that persist across steps, rather than discrete text tokens.

Token-level Memory: Discrete units (text, visual tokens) stored externally that can be explicitly inspected, modified, and retrieved.

Experiential Memory: Records of past actions, outcomes, and reasoning traces used to improve future problem-solving (distinct from static facts).

Context Engineering: Resource management paradigm optimizing the context window payload, focusing on interface correctness rather than cognitive continuity.

RAG: Retrieval-Augmented Generation—typically serving static external knowledge for single-turn queries, unlike the evolving internal state of Agent Memory.

KV cache: Key-Value cache—storing attention computations to speed up generation; categorized here as 'LLM Memory' rather than 'Agent Memory'.