CUGA: Computer Using Generalist Agent—the system proposed in this paper
WebArena: A benchmark environment for evaluating web-based agents on realistic tasks
AppWorld: A benchmark for evaluating agents on complex, multi-step workflows across diverse API-driven applications
MCP: Model Context Protocol—used here to back applications with servers generated from OpenAPI specifications
OpenAPI: A standard specification for defining RESTful APIs, used by the agent to understand available tools
Playwright: A library for browser automation used by the web sub-agent to control the browser
Accessibility Tree: A hierarchical representation of a user interface's elements, used by the agent to perceive the web page structure
Grounding: The process of linking abstract concepts (like 'the submit button') to specific concrete elements in the environment (e.g., a specific DOM element ID)
LangGraph: A library for building stateful, multi-agent applications with LLMs, used for orchestration
LangChain: A framework for developing applications powered by language models