← Back to Paper List

Rethinking Memory in LLM based Agents: Representations, Operations, and Emerging Topics

Yiming Du, Wenyu Huang, Danna Zheng, Zhaowei Wang, Sébastien Montella, Mirella Lapata, Kam-Fai Wong, Jeff Z. Pan
The Chinese University of Hong Kong, The University of Edinburgh, The Hong Kong University of Science and Technology, Huawei Technologies Research & Development (UK) Limited
arXiv (2025)
Memory Agent RAG P13N Benchmark

📝 Paper Summary

Memory organization Memory recall Layered memory
This paper establishes a unified taxonomy for agent memory, categorizing it by representation (parametric vs. contextual) and defining six core operations—encoding, evolving, and adapting—to organize current research and benchmarks.
Core Problem
Existing surveys on LLM agent memory focus on high-level applications (like personalization) or specific subtopics (like long-context modeling) without a unified framework defining atomic operations or structural foundations.
Why it matters:
  • Lack of a unified framework fragments research, making it difficult to understand how different memory mechanisms (e.g., KV cache eviction vs. knowledge graph storage) relate or interact
  • Current literature overlooks the complete memory lifecycle, focusing often on retrieval while neglecting critical operations like consolidation, forgetting, and condensation
  • Developers lack structured guidance on selecting appropriate memory types (parametric vs. contextual) and operations for building robust, long-term capable agents
Concrete Example: Current benchmarks reveal a disconnect: models achieve >90 Recall@5 on retrieval tasks (e.g., 2Wiki) but lag by >30 points in generation metrics (F1), indicating that high retrievability does not guarantee effective memory utilization due to poor condensation or reasoning.
Key Novelty
Operational Taxonomy for Agent Memory
  • Formalizes memory into two representations: Parametric (implicit in weights) and Contextual (explicit external data), bridging the gap between model fine-tuning and RAG
  • Defines six atomic operations governing the memory lifecycle: Consolidation (writing), Indexing (organizing), Updating (modifying), Forgetting (removing), Retrieval (accessing), and Condensation (compressing)
  • Introduces the Relative Citation Index (RCI) to analyze research trends, normalizing citation counts by publication age to identify emerging high-impact topics like KV cache optimization
Architecture
Architecture Figure Figure 1
A unified framework of memory in LLM-based agents, mapping Taxonomy (Structured/Unstructured, Parametric/Contextual), Operations (Encoding, Evolving, Adapting), and High-Impact Topics.
Evaluation Highlights
  • Analysis of >30,000 papers reveals a gap between retrieval and generation: retrieval recall is often >90% while generation F1 scores drop to ~60% on benchmarks like LoCoMo and MemoryBank
  • Identifies that long-term memory benchmarks (e.g., LoCoMo) span 20-30 turns but largely ignore dynamic operations like forgetting or updating, focusing instead on static QA
  • Demonstrates via RCI analysis that 'KV cache eviction' and 'context compression' are rapidly growing high-impact topics within the long-context memory domain
Breakthrough Assessment
9/10
Provides a highly necessary, comprehensive framework that unifies disparate memory research (RAG, long-context, model editing) into a single operational taxonomy, significantly clarifying the field.
×