← Back to Paper List

Agent S: An Open Agentic Framework that Uses Computers Like a Human

Saaket Agashe, Jiuzhou Han, Shuyu Gan, Jiachen Yang, Ang Li, Xin Eric Wang
Simular Research
arXiv (2024)
Agent Memory MM RAG Benchmark

📝 Paper Summary

Memory organization Memory recall Self-evolving Agentic reasoning
Agent S automates complex computer tasks by combining hierarchical planning with external knowledge and internal memory, refining its performance through self-supervised experience accumulation and a specialized interface.
Core Problem
Automating OS-level tasks fails because agents lack domain knowledge for diverse apps, struggle with long-horizon planning, and cannot precisely ground actions on dynamic, non-uniform GUIs.
Why it matters:
  • Current GUI agents struggle to generalize across the vast range of constantly evolving desktop applications and websites
  • Long-horizon tasks require tracking progress and intermediate subgoals, which standard flat-planning agents often lose track of
  • Precise mouse/keyboard control is difficult for MLLMs due to a lack of internal coordinate systems and the need to process dense visual information
Concrete Example: In a long-horizon task like 'create a calendar invite based on an email', a standard agent might successfully open the calendar but fail to copy specific details or lose track of the date after switching windows, whereas Agent S retrieves past successful subtask experiences to guide the specific form-filling steps.
Key Novelty
Experience-Augmented Hierarchical Planning with Continual Memory Update
  • Decomposes tasks into subtasks where the high-level planner uses 'Narrative Memory' (abstract summaries of past full tasks) and 'Online Web Knowledge' to form a strategy
  • Low-level workers execute subtasks using 'Episodic Memory' (detailed step-by-step traces) to guide specific actions, updating memory with self-evaluated success/failure summaries
Architecture
Architecture Figure Figure 3
The complete Agent S framework, detailing the interaction between the User, Manager, Worker, and the Environment via the ACI.
Evaluation Highlights
  • Achieves 20.58% success rate on OSWorld, outperforming the baseline by 9.37 percentage points (83.6% relative improvement)
  • Establishes new state-of-the-art on OSWorld across multiple categories including daily tasks and professional workflows
  • Generalizes to WindowsAgentArena with 18.2% success rate (vs 13.3% baseline) without explicit adaptation, showing cross-OS robustness
Breakthrough Assessment
8/10
Significant jump in SOTA on the difficult OSWorld benchmark. effectively combines hierarchical planning with a retrieval-based memory system that learns from experience, addressing key bottlenecks in long-horizon GUI automation.
×