Narrative Memory: High-level, abstract summaries of past full-task experiences used by the Manager for planning
Episodic Memory: Detailed, step-by-step records of subtask execution used by Workers for low-level action generation
ACI: Agent-Computer Interface—an abstraction layer that translates MLLM outputs into precise computer actions and provides grounded observations via accessibility trees
OSWorld: A benchmark environment for evaluating multimodal agents on open-ended computer tasks within a Linux operating system
WindowsAgentArena: A benchmark for evaluating agents on Windows OS tasks
RAG: Retrieval-Augmented Generation—AI systems that answer questions or plan by first searching for relevant documents/memories
Set-of-Mark Prompting: A visual prompting technique where objects in an image are overlaid with numeric tags to help the model reference them
Accessibility Tree: A hierarchical representation of a user interface's elements (buttons, text, etc.) provided by the OS for assistive technologies
IOU: Intersection over Union—a metric used to measure the overlap between two bounding boxes