← Back to Paper List

MemGen: Weaving Generative Latent Memory for Self-Evolving Agents

(Singapore) Guibin Zhang, Muxin Fu, Shuicheng Yan
National University of Singapore
arXiv, 9/2025 (2025)
Memory Agent RL Reasoning

📝 Paper Summary

Memory internalization Agent evolution
MemGen equips LLM agents with a dynamic memory system that generates latent tokens on-demand during reasoning, interleaving memory and cognition without modifying the core model weights.
Core Problem
Existing agent memory paradigms either rely on rigid retrieval from external databases (lacking fluid integration with reasoning) or update model parameters directly (causing catastrophic forgetting).
Why it matters:
  • Parametric memory methods like SFT suffer from catastrophic forgetting when learning new tasks, eroding general knowledge
  • Retrieval-based memory (RAG, ExpeL) is tethered to context engineering and often retrieves static information once at the start, failing to support dynamic, multi-step reasoning
  • Current systems lack the human-like ability to fluidly interweave memory recall with ongoing thought processes
Concrete Example: In a task like 'Find a flight from JFK to LAX and book a ride', a retrieval-based agent might fetch all flight info at the start. However, when it later realizes the API is down during execution, it lacks a mechanism to dynamically recall an alternative strategy (e.g., 'use an iterative search paradigm') mid-reasoning.
Key Novelty
Dynamic Generative Latent Memory (MemGen)
  • Decouples memory from the core reasoner by using a separate 'Memory Weaver' module that generates machine-native latent tokens (memory) only when triggered
  • Introduces a 'Memory Trigger' acting as a metacognitive monitor that decides exactly when to pause reasoning and insert memory tokens based on the agent's current hidden states
  • Treats memory as a generative act of reconstruction rather than static retrieval, allowing the agent to synthesize bespoke cognitive context on the fly
Evaluation Highlights
  • +31.7% improvement on ALFWorld and +27.1% on KodCode with Qwen3-8B compared to vanilla baselines
  • Surpasses parametric memory methods (REINFORCE++) by +5.8% and retrieval systems (ExpeL, AWM) by up to 38.22% on ALFWorld
  • Demonstrates strong cross-domain generalization: training on math tasks improves science reasoning (+6.06%) and code generation (+5.1%) without direct supervision
Breakthrough Assessment
9/10
Proposes a fundamentally new memory paradigm (generative latent memory) that solves the rigidity of RAG and the forgetting of SFT. Strong empirical results and emergent human-like memory hierarchy justify the high score.
×