← Back to Paper List

The 2025 AI Agent Index: Documenting Technical and Safety Features of Deployed Agentic AI Systems

Leon Staufer, Kevin Feng, Kevin Wei, Luke Bailey, Yawen Duan, Mick Yang, A. Pinar Ozisik, Stephen Casper, Noam Kolt
University of Cambridge, University of Washington, Harvard Law School, Stanford University, Concordia AI, University of Pennsylvania, Massachusetts Institute of Technology, Hebrew University of Jerusalem
arXiv (2026)
Agent Benchmark

📝 Paper Summary

AI Governance and Transparency AI Agent Evaluation Deployed Agentic Systems
The 2025 AI Agent Index systematically documents 30 high-impact deployed agentic systems across six categories to reveal critical gaps in transparency, safety practices, and evaluation standards.
Core Problem
Despite rapid deployment and economic investment in agentic AI, the ecosystem remains opaque, with little public information available to researchers and policymakers regarding system capabilities, development processes, and safety guardrails.
Why it matters:
  • Policymakers lack data on who is developing impactful systems and what risks they pose, hindering effective regulation
  • Researchers struggle to track rapid evolution in the agent ecosystem due to inconsistent documentation
  • Highly capable agents present unique risks (e.g., direct harm via tool use) that distinct from chat-based systems, yet safety practices remain obscure
Concrete Example: While chatbots cause harm only if users act on outputs, agentic systems can directly execute actions like autonomously hacking websites. Yet, most developers share little information about what guardrails prevent these specific autonomous risks.
Key Novelty
The 2025 AI Agent Index
  • Systematically annotates 30 state-of-the-art deployed agents across 45 distinct fields covering legal, technical, autonomy, ecosystem, evaluation, and safety dimensions
  • Introduces rigorous inclusion criteria combining agency definitions (autonomy, goal complexity, environmental interaction, generality) with real-world impact metrics (search volume, market cap)
  • Reveals ecosystem-wide trends by analyzing transparency levels and development practices across three distinct agent types: chat applications, browser agents, and enterprise workflows
Evaluation Highlights
  • Indexed 30 highly agentic products selected from 95 candidates based on strict agency and impact criteria
  • Annotated 45 distinct information fields per system, revealing that most developers share minimal information on safety and societal impact
  • Identified 23% response rate from companies contacted for verification, lower than the previous year's index
Breakthrough Assessment
9/10
A critical resource for the field. While not a technical architecture paper, it establishes the standard for documenting and analyzing the rapidly growing landscape of deployed AI agents.
×