Memory in the Age of AI Agents: A Survey Forms, Functions and Dynamics

📝 Paper Summary

Memory organization Memory recall Agent evolution

This survey unifies fragmented research on agent memory into a coherent taxonomy defined by Forms (token/parametric/latent), Functions (factual/experiential/working), and Dynamics (formation/evolution/retrieval).

Core Problem

Research on agent memory is fragmented with inconsistent terminology, where concepts like 'LLM memory', 'RAG', and 'agent memory' are conflated, hindering systematic development.

Why it matters:

Lack of standardized definitions makes it difficult to compare mechanisms like short-term memory versus context window engineering
Existing taxonomies fail to capture emerging 2025 trends such as memory-augmented test-time scaling or reusable tool distillation
Without clear conceptual boundaries, researchers cannot effectively distinguish between static knowledge retrieval (RAG) and self-evolving agentic experience

Concrete Example: Early systems like MemoryBank and MemGPT were framed as 'LLM memory' solutions for extending context, but they are functionally 'agent memory' because they enable decision-making entities to track user preferences and accumulate experience across multi-turn interactions, unlike purely architectural context extensions.

Key Novelty

Unified Forms-Functions-Dynamics Taxonomy

Distinguishes Agent Memory from RAG and Context Engineering: Agent memory focuses on persistent, self-evolving cognitive states rather than just static retrieval or resource management
Classifies memory by Form: Token-level (discrete units like text), Parametric (weights), and Latent (hidden states)
Classifies memory by Function: Factual (world knowledge), Experiential (procedural skills/cases), and Working (current task workspace)

Evaluation Highlights

Compiles a list of key benchmarks including LoCoMo, LongMemEval, GAIA, and SWE-bench Verified for evaluating long-horizon agent capabilities
Identifies 3 distinct memory forms (Token, Parametric, Latent) and 3 functional types (Factual, Experiential, Working) to categorize existing literature

Breakthrough Assessment

9/10

Provides a highly necessary, rigorous conceptual framework that cleans up a confused field. It successfully disentangles agent memory from RAG and context engineering, setting a standard for future research.

⚙️ Technical Details

Problem Definition

Setting: LLM-based agent systems interacting with environments over time, requiring persistent state management

Inputs: Observations o_t, task specifications Q, and interaction history

Outputs: Actions a_t generated by policy π(o_t, m_t, Q) where m_t is retrieved memory

Pipeline Flow

Formation (Input Artifacts → Memory Candidates)
Evolution (Memory Candidates → Persistent Memory State)
Retrieval (Current State + Query → Memory Signal)

System Modules

Memory Formation Operator (F) (Dynamics)

Selectively transforms raw artifacts (tool outputs, reasoning traces) into memory candidates

Model or implementation: Not applicable (conceptual operator)

Memory Evolution Operator (E) (Dynamics)

Integrates candidates into the memory base via consolidation, conflict resolution, or forgetting

Model or implementation: Not applicable (conceptual operator)

Memory Retrieval Operator (R) (Dynamics)

Constructs queries and retrieves relevant memory content for the agent's policy

Model or implementation: Not applicable (conceptual operator)

Novel Architectural Elements

Formalization of the memory lifecycle into three distinct operators: Formation, Evolution, and Retrieval, independent of specific storage implementation (vector DB vs. graph vs. weights)
Unified taxonomy integrating Token-level, Parametric, and Latent forms under a single agentic framework

Comparison to Prior Work

vs. LLM Memory: Agent memory focuses on persistent cognitive states and cross-task evolution, whereas LLM memory focuses on extending effective context length
vs. RAG: Agent memory is self-evolving and accumulates internal experience from interactions, whereas RAG typically accesses static external data
vs. Context Engineering: Agent memory manages 'what the agent knows' (cognitive scope), while context engineering manages 'what fits in the window' (resource scope)

Limitations

The survey taxonomy relies on the current state of 2025/2026 literature and may need updates as new forms (e.g., purely neuro-symbolic hybrids) emerge
Distinctions between RAG and Agent Memory are acknowledged to be blurring, making strict categorization difficult in some edge cases (e.g., Agentic RAG)
Does not provide empirical benchmarks of its own, but rather aggregates existing ones

Reproducibility

Code: https://github.com/Shichun-Liu/Agent-Memory-Paper-List

The paper is a survey and does not propose a specific model to reproduce. It provides a GitHub repository (https://github.com/Shichun-Liu/Agent-Memory-Paper-List) containing the list of papers discussed.

📊 Experiments & Results

Evaluation Setup

Survey of existing benchmarks and evaluation protocols

Benchmarks:

LoCoMo (Long-context dialogue evaluation)
LongMemEval (Long-term memory evaluation)
GAIA (Complex problem-solving)
SWE-bench Verified (Software engineering / Code generation)

Metrics:

Statistical methodology: Not explicitly reported in the paper

Main Takeaways

Agent memory is distinct from LLM memory and RAG because it implies a persistent, evolving identity and experience base
Memory forms are shifting from simple text buffers (Token-level) to more complex Parametric (weight-based) and Latent (hidden state) representations
Functional roles of memory are specializing into Factual (what I know), Experiential (what I have done), and Working (what I am doing now)
Future frontiers include automating memory management (removing human-crafted schemas), integrating Reinforcement Learning for memory operations, and developing shared memory for multi-agent systems

📚 Prerequisite Knowledge

Prerequisites

Understanding of Large Language Models (LLMs) and their context window limitations
Familiarity with Retrieval-Augmented Generation (RAG)
Basic knowledge of reinforcement learning (RL) concepts (policy, trajectory, environment)

Key Terms

Token-level Memory: Memory stored as discrete, inspectable units (text, JSON, visual tokens) external to model parameters

Parametric Memory: Memory encoded implicitly within the model's neural network weights, often updated via fine-tuning or gradients

Latent Memory: Memory represented as continuous vector hidden states or activations that persist across inference steps

Factual Memory: Storage of declarative knowledge about users (profiles) and the environment (world state)

Experiential Memory: Storage of procedural knowledge, such as past successful plans, failures, or distilled skills

Working Memory: Temporary workspace for managing information relevant to the current active task or reasoning chain

RAG: Retrieval-Augmented Generation—typically using static external databases to ground generation, distinct from self-evolving agent memory

Context Engineering: Optimizing the information payload within the LLM's finite context window (resource management), distinct from the cognitive scope of memory

KV Cache: Key-Value Cache—storage of pre-computed attention representations to speed up generation, often confused with agent memory

Graph RAG: Structuring knowledge as a graph (nodes/edges) to enable relational retrieval, used in both RAG and agent memory systems

MCP: Model Context Protocol—a standard for connecting AI assistants to data systems, falling under context engineering