← Back to Paper List

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems

Guibin Zhang, Muxin Fu, Guancheng Wan, Miao Yu, Kun Wang, Shuicheng Yan
National University of Singapore, Tongji University, University of California, Los Angeles, Agency for Science, Technology and Research, Nanyang Technological University
arXiv (2025)
Memory Agent Reasoning Benchmark

📝 Paper Summary

Tree/graph-baesd memory Multi-agent Self-evolving Agentic reasoning
G-Memory enables multi-agent systems to self-evolve by organizing lengthy interaction histories into a three-tier graph hierarchy (insight, query, interaction) for retrieving both abstract wisdom and procedural details.
Core Problem
Current multi-agent systems (MAS) lack self-evolution capabilities because their memory mechanisms are either overly simplistic (ignoring interaction nuances) or lack cross-trial persistence.
Why it matters:
  • MAS interactions generate up to 10x more tokens than single agents, overwhelming traditional retrieval contexts
  • Existing systems like MetaGPT only store final results, discarding the valuable collaborative process that explains *why* a solution worked
  • Without structured memory, agent teams repeat mistakes and fail to improve coordination strategies over time
Concrete Example: In an embodied task 'put a clean cloth in countertop', standard agents might fail by not cleaning the cloth first. G-Memory retrieves a past trajectory where an agent was corrected for putting a dirty egg in a microwave, successfully guiding the new team to clean the item first.
Key Novelty
Three-Tier Hierarchical Graph Memory for MAS
  • Organizes memory into three levels: fine-grained utterance logs (Interaction), task metadata and status (Query), and abstract lessons (Insight)
  • Uses bi-directional traversal: moving 'up' to find general strategies (insights) and 'down' to find specific procedural examples (interactions) based on the current task
  • Updates continuously: successful or failed executions trigger the generation of new insights and graph connections, allowing the collective intelligence to grow
Architecture
Architecture Figure Figure 1 (Right)
The three-tier hierarchical memory architecture of G-Memory.
Evaluation Highlights
  • +20.89% success rate improvement on ALFWorld (embodied action) using MacNet + Qwen-2.5-14b compared to the original framework
  • +10.12% accuracy gain on HotpotQA (knowledge reasoning) using DyLAN + GPT-4o-mini compared to DyLAN with no memory
  • Consumes only 1.4M additional tokens for a 10.32% performance gain on PDDL, whereas MetaGPT-M consumes 2.2M tokens for only a 4.07% gain
Breakthrough Assessment
8/10
Significant performance gains across diverse domains (up to ~20%) and a principled structural solution to the 'long context' problem in MAS interactions. Addresses a critical gap in MAS self-evolution.
×