← Back to Paper List

MIRIX: Multi-Agent Memory System for LLM-Based Agents

Yu Wang, Xi Chen
Not reported in the paper
arXiv.org (2025)
Memory Agent MM Benchmark P13N

📝 Paper Summary

Memory organization Memory recall Multi-modal memory
MIRIX is a multi-agent memory system that organizes user data into six distinct structured components (like episodic and semantic memory) to enable long-term, multimodal recall from high-resolution screen activity.
Core Problem
Existing LLM memory systems rely on flat, text-centric storage that fails to handle the scale, structure, and multimodal nature of real-world user data over time.
Why it matters:
  • Current assistants remain effectively stateless beyond the prompt window, preventing true personalization or evolution with the user.
  • Storing raw multimodal inputs (like constant screenshots) is prohibitively expensive without effective abstraction layers.
  • Flat vector databases lack the structural organization needed to distinguish between procedural instructions, specific events, and general facts.
Concrete Example: A standard RAG system stores all historical data in a single flat store. When asked about a specific visual event from weeks ago among 20,000 screenshots, it struggles to retrieve the correct image due to lack of context, whereas MIRIX routes this to specific memory types (Episodic/Resource) for accurate retrieval.
Key Novelty
Six-Component Multi-Agent Memory Architecture
  • Divides memory into six specialized structures (Core, Episodic, Semantic, Procedural, Resource, Knowledge Vault) rather than a single vector store, mimicking human cognitive organization.
  • Assigns a dedicated 'Memory Manager' agent to each memory type, coordinated by a Meta Memory Manager, to handle the complexity of routing and updating diverse information.
  • Introduces a screenshot-based memory pipeline that continuously abstracts visual activity into structured text and low-redundancy logs, enabling recall over months of usage.
Architecture
Architecture Figure Figure 1
The overall MIRIX architecture showing the six memory components and the multi-agent management system.
Evaluation Highlights
  • Achieves 35% higher accuracy than RAG baselines on the new ScreenshotVQA benchmark while reducing storage requirements by 99.9%.
  • Attains 85.38% accuracy on the LOCOMO long-context benchmark, outperforming the best existing method by 8.0%.
  • Outperforms long-context baselines on ScreenshotVQA by 410% while using 93.3% less storage.
Breakthrough Assessment
8/10
Strong structural innovation in memory design (6 distinct types managed by agents) and demonstrates massive efficiency gains in multimodal storage (99.9% reduction) while improving accuracy.
×