โ† Back to Paper List

Magentic-UI: Towards Human-in-the-loop Agentic Systems

Hussein Mozannar, Gagan Bansal, Cheng Tan, Adam Fourney, Victor C. Dibia, Jingya Chen, Jack Gerrits, Tyler Payne, Matheus Kunzler Maldaner, Madeleine Grunde-McLaughlin, Eric Zhu, Griffin Bassman, Jacob Alber, Peter Chang, Ricky Loynd, Friederike Niedtner, Ece Kamar, Maya Murad, Rafah Hosn, Saleema Amershi
Microsoft Research
arXiv.org (2025)
Agent Memory Benchmark

๐Ÿ“ Paper Summary

Human-in-the-loop (HITL) agents Web and OS agents Human-Agent Interaction (HAI)
Magentic-UI is an open-source interface that integrates humans into multi-agent workflows through specific interaction mechanisms like co-planning and co-tasking, aiming to boost reliability and safety in complex agentic tasks.
Core Problem
Autonomous agents currently fail to achieve human-level performance on complex tasks (e.g., browsing, coding) and introduce safety risks like misalignment and irreversible actions when operating without oversight.
Why it matters:
  • Current agents struggle with long-horizon tasks (minutes to hours), leading to wasted time and compounding errors if left unchecked.
  • Agents acting directly on the real world (web/OS) create new attack surfaces for adversarial manipulation and safety violations.
  • Completely autonomous systems often fail to capture user intent or specific constraints that are difficult to specify upfront.
Concrete Example: A user asks an agent to 'buy a charger for my Surface laptop.' An autonomous agent might plan to buy it on Amazon. However, the user knows it's only officially sold on Microsoft.com. Without co-planning, the agent wastes time searching the wrong site or buys an incompatible third-party item.
Key Novelty
Magentic-UI (Multiagentic-UserInterface)
  • Treats the human user as a distinct agent within a multi-agent team, managed by an Orchestrator that delegates tasks to the human when necessary.
  • Introduces six specific interaction patterns (co-planning, co-tasking, action guards, etc.) to operationalize human oversight without overwhelming the user.
  • Embeds a live browser within the agent interface, allowing seamless control hand-offs where the user can physically intervene in the agent's browsing session.
Architecture
Architecture Figure Figure 2
The Magentic-UI interface layout and its components.
Evaluation Highlights
  • Simulated user testing on GAIA benchmark shows Magentic-UI facilitates human intervention, though autonomous success rates remain baseline (e.g., 29.3% on Level 1 validation).
  • Qualitative studies demonstrate the utility of 'co-tasking' (interrupting execution) for handling captchas and correcting navigation errors.
  • Safety assessments confirm 'action guards' prevent high-stakes actions (e.g., irreversible purchases) until explicit human approval is granted.
Breakthrough Assessment
7/10
While the underlying agent performance isn't a breakthrough, the system architecture for Human-Agent Interaction (viewing the human as a tool/agent) and the open-source platform for studying these interactions are significant contributions.
×