โ† Back to Paper List

Agent Workflow Memory

Z. Wang, Jiayuan Mao, Daniel Fried, Graham Neubig
Carnegie Mellon University, Massachusetts Institute of Technology
International Conference on Machine Learning (2024)
Memory Agent Benchmark

๐Ÿ“ Paper Summary

Agentic AI Memory
Agent Workflow Memory enables web agents to abstract reusable sub-routines from past experiences and store them as workflows to guide future long-horizon tasks.
Core Problem
Current agents solve tasks in isolation, failing to learn from past successes or extract reusable routines, making them brittle when task contexts change.
Why it matters:
  • Agents that do not adapt over time waste computation solving the same sub-problems repeatedly
  • Standard in-context learning with fixed examples lacks robustness to environmental changes (e.g., different websites or domains)
  • Long-horizon tasks require complex trajectories that are difficult to generate from scratch without hierarchical guidance
Concrete Example: When an agent needs to 'get the zip code of a place', a standard agent might fail to plan the necessary steps. In contrast, AWM recalls a previously learned 'find a place by its name' workflow and uses it as a reliable sub-goal to complete the complex task.
Key Novelty
Agent Workflow Memory (AWM)
  • Induces 'workflows' (abstracted routines) from trajectories by replacing specific values (e.g., 'dry cat food') with placeholders (e.g., '{product-name}') to ensure reusability
  • Implements a 'snowball effect' where simple induced workflows serve as building blocks for more complex future workflows
  • Supports both offline induction (from annotated datasets) and online supervision-free induction (from self-generated successful trials)
Architecture
Architecture Figure Figure 4
The Online Agent Workflow Memory process loop
Evaluation Highlights
  • +51.1% relative improvement in success rate on WebArena compared to the top published autonomous method (Drouin et al., 2024)
  • +24.6% relative improvement in success rate on Mind2Web compared to baselines
  • Surpasses baselines by 8.9 to 14.0 absolute points on Mind2Web cross-domain evaluations, showing robustness to distribution shifts
Breakthrough Assessment
8/10
Significant relative improvements on major benchmarks (WebArena, Mind2Web) and a practical approach to agent memory that bridges the gap between fixed few-shot examples and continuous learning.
×