← Back to Paper List

Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution

Jiahao Qiu, Xuan Qi, Tongcheng Zhang, Xinzhe Juan, Jiacheng Guo, Yifu Lu, Yimin Wang, Zixin Yao, Qihan Ren, Xun Jiang, Xing Zhou, Dongrui Liu, Ling Yang, Yue Wu, Kaixuan Huang, Shilong Liu, Hongru Wang, Mengdi Wang
AI Lab, Princeton University, IIIS, Tsinghua University, Shanghai Jiao Tong University
arXiv (2025)
Agent Reasoning Benchmark MM

📝 Paper Summary

Self-evolving Agentic reasoning Tool Creation Generalist Agent
Alita is a generalist agent that solves complex tasks by autonomously generating, executing, and encapsulating code into reusable Model Context Protocol (MCP) tools rather than relying on large libraries of predefined tools.
Core Problem
Existing generalist agents rely heavily on extensive manual engineering of predefined tools and static workflows, which limits adaptability to new domains and creates compatibility issues.
Why it matters:
  • Predefined toolkits cannot cover the infinite variety of real-world tasks (incomplete coverage)
  • Hardcoded workflows constrain the agent's ability to creatively compose tools for novel problems (limited flexibility)
  • Manual tool integration often faces interface mismatches, especially with non-Python tools
Concrete Example: In a YouTube 360 VR video task, a standard agent might fail due to lacking a specific subtitle extraction tool. Alita, recognizing the gap, autonomously searches for a solution, finds the 'youtube-transcript-api' library, generates a script to use it, creates a Conda environment, and encapsulates this new capability as a reusable tool.
Key Novelty
Minimal Predefinition + Maximal Self-Evolution via MCPs
  • Instead of shipping with 100+ tools, Alita starts with only a web agent and a code interpreter, then builds its own tools on the fly using the Model Context Protocol (MCP)
  • Implements a self-reinforcing loop where valid generated code is not just executed once but wrapped into an MCP server for future reuse by itself or other agents
  • Uses 'MCP Brainstorming' to self-assess capability gaps before execution, proactively deciding whether to search for new external libraries or write custom scripts
Architecture
Architecture Figure Figure 3
The architectural workflow of Alita, detailing the cycle of brainstorming, tool creation, and execution.
Evaluation Highlights
  • Achieves 75.15% pass@1 and 87.27% pass@3 on the GAIA benchmark, outperforming OpenAI Deep Research (67.36% pass@1)
  • Reusing Alita-generated MCPs triples the accuracy of smaller models (GPT-4o-mini) on hard tasks (GAIA Level 3) from 3.85% to 11.54%
  • Surpasses Octotools on Mathvista (74.00% vs 68%) and PathVQA (52.00% vs 47%) despite using minimal predefined tooling
Breakthrough Assessment
8/10
Strong conceptual shift from static tool libraries to dynamic tool generation. The performance on GAIA is impressive, and the 'distillation' of capabilities via generated MCPs to smaller models is a significant practical contribution.
×