← Back to Paper List

OpenHands: An Open Platform for AI Software Developers as Generalist Agents

Xingyao Wang, Boxuan Li, Yufan Song, Frank F. Xu, Xiangru Tang, Mingchen Zhuge, Jiayi Pan, Yueqi Song, Bowen Li, Jaskirat Singh, Hoang H. Tran, Fuqiang Li, Ren Ma, Mingzhang Zheng, Bill Qian, Yanjun Shao, Niklas Muennighoff, Yizhe Zhang, Binyuan Hui, Junyang Lin, Robert Brennan, Hao Peng, Heng Ji, Graham Neubig
Alibaba
arXiv (2024)
Agent Benchmark MM

📝 Paper Summary

Agentic AI Software Engineering Agents Web Agents
OpenHands is an extensible community platform that enables AI agents to interact with the world like human developers—via code, command line, and browser—within a secure sandboxed environment.
Core Problem
Building agents that can safely and effectively develop software is difficult because they require complex toolchains, safe execution environments (sandboxes), and flexible interaction mechanisms that existing frameworks often lack.
Why it matters:
  • Software is the primary interface for complex world interaction, yet agents struggle to modify code safely without negative side effects on user systems
  • Existing frameworks often lack the specialized tooling (Agent-Computer Interface) needed for on-the-fly debugging and information gathering
  • Creating and maintaining diverse tools for different agent implementations is a significant engineering burden
Concrete Example: A generalist agent might fail a complex task like 'fix a bug in this repo' if it cannot safely execute code to reproduce the error or if it lacks a browser to look up documentation, whereas OpenHands provides a docker-sandboxed bash and browser environment to do exactly this.
Key Novelty
Unified Agent-Computer Interface (ACI) in a Sandboxed Runtime
  • Event Stream Architecture: Decouples the agent's logic from the environment, treating all interactions (actions, observations, user feedback) as a chronological sequence of events
  • Docker-Sandboxed Runtime: Provides a standardized, secure environment where agents can execute arbitrary bash commands and Python code without risking the host system
  • AgentSkills Library: A Python-based toolbox that allows agents to import and use specialized skills (e.g., file editing, PDF parsing) just like a human developer imports libraries
Evaluation Highlights
  • CodeActAgent (using Claude-3.5-Sonnet) achieves 26.0% on SWE-bench Lite, comparable to specialized commercial baselines like Aider (26.3%)
  • CodeActAgent (using Claude-3.5-Sonnet) scores 15.3% on WebArena, outperforming the WebArena Agent baseline (14.4%) without task-specific tuning
  • CodeActAgent (using Claude-3.5-Sonnet) achieves 52.0% on GPQA (Graduate-Level Google-Proof Q&A), significantly outperforming GPT-4 few-shot baselines (38.8%)
Breakthrough Assessment
9/10
OpenHands provides a critical infrastructure layer (sandboxing, event streams, skill libraries) that standardizes how agents interact with software, enabling a massive community effort (32K stars) to build generalist agents.
×