← Back to Paper List

Magentic-One: A Generalist Multi-Agent System for Solving Complex Tasks

Adam Fourney, Gagan Bansal, Hussein Mozannar, Cheng Tan, Eduardo Salinas, E. Zhu, Friederike Niedtner, Grace Proebsting, Griffin Bassman, Jack Gerrits, Jacob Alber, Peter Chang, Ricky Loynd, Robert West, Victor C. Dibia, Ahmed M. Awadallah, Ece Kamar, Rafah Hosn, Saleema Amershi
Microsoft Research
arXiv.org (2024)
Agent Benchmark Reasoning MM

πŸ“ Paper Summary

Multi-agent systems Agentic workflow orchestration Generalist agents
Magentic-One is a multi-agent system where a central Orchestrator dynamically plans, tracks progress via structured ledgers, and routes subtasks to specialized agents (Web, File, Code) to solve complex, open-ended problems.
Core Problem
Existing agentic systems often lack the generality to handle diverse, multi-step tasks that require planning, error recovery, and dynamic tool usage across both web and local file environments.
Why it matters:
  • Monolithic single-agent approaches struggle with complex, long-horizon tasks requiring distinct skills (e.g., coding vs. browsing)
  • Rigid workflows cannot adapt to novel errors or changing environments, limiting real-world utility
  • Evaluation of agentic systems is difficult due to side-effects and stochasticity, requiring rigorous containment and repetition controls
Concrete Example: A user asks for a survey and slide deck of recent AI safety papers. A single agent might fail to navigate the web, download PDFs, read them, *and* write code to generate slides in one context. Magentic-One splits this: WebSurfer finds papers, FileSurfer reads them, Coder writes the slide-generation script, and ComputerTerminal executes it.
Key Novelty
Ledger-based Orchestrator for Multi-Agent Dynamic Routing
  • Uses a central Orchestrator that maintains two structured ledgers (Task Ledger for overall plan/facts, Progress Ledger for immediate history) to manage short-term memory and planning
  • Implements a dual-loop workflow: an outer loop for high-level replanning/reflection and an inner loop for step-by-step instruction of specialized agents
  • Modular design allows adding/removing agents (e.g., WebSurfer, Coder) without altering the core Orchestrator logic or prompt tuning
Architecture
Architecture Figure Figure 2
The Magentic-One architecture workflow, illustrating the Orchestrator's interaction with the Task/Progress Ledgers and the specialized agents.
Evaluation Highlights
  • Achieves 38% completion rate on GAIA benchmark (validation set), statistically competitive with state-of-the-art
  • Achieves 32.8% completion on WebArena, performing competitively against specialized web-only agents
  • Attains 27.7% accuracy on AssistantBench, demonstrating capability in realistic user-assistant tasks
Breakthrough Assessment
8/10
Strong empirical results across diverse benchmarks (WebArena, GAIA) using a unified, generalist architecture. The ledger-based orchestration offers a clean, extensible paradigm for multi-agent coordination.
×