← Back to Paper List

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

Weizhen Li, Jianbo Lin, Zhuosong Jiang, Jingyi Cao, Xinpeng Liu, Jiayu Zhang, Zhenqiang Huang, Qianben Chen, Weichen Sun, Qiexiang Wang, Hongxuan Lu, Tianrui Qin, Chenghao Zhu, Yi Yao, Shuying Fan, Xiaowan Li, Tiannan Wang, Pai Liu, King Zhu, He Zhu, Dingfeng Shi, Piaohong Wang, Yeyi Guan, Xiangru Tang, Minghao Liu, Yuchen Eleanor Jiang, Jian Yang, Jiaheng Liu, Ge Zhang, Wangchunshu Zhou
OPPO AI Agent Team
arXiv (2025)
Agent RL Reasoning Benchmark

📝 Paper Summary

End-to-end Agent Foundation Models Multi-Agent Systems Tool-Integrated Reasoning
Chain-of-Agents (CoA) distills the collaborative capabilities of multi-agent systems into a single end-to-end model, enabling it to dynamically orchestrate role-playing and tool-use without complex prompt engineering.
Core Problem
Existing multi-agent systems rely on inefficient manual prompt engineering and rigid workflows, causing high computational overhead and preventing data-centric learning, while standard Tool-Integrated Reasoning (TIR) models lack the ability to support diverse role-playing and complex collaboration.
Why it matters:
  • Traditional multi-agent systems suffer from high token costs due to redundant inter-agent communication and struggle to generalize without extensive reconfiguration
  • Current LLMs are not natively trained to support multi-turn, multi-agent, and multi-tool workflows, relying instead on fragile prompt engineering
  • Bridging the gap between the flexibility of multi-agent systems and the efficiency of end-to-end models is crucial for scalable complex problem solving
Concrete Example: In a standard multi-agent system solving a deep research task, agents might exchange repetitive messages like 'Reviewing...' or 'Handing off to...', consuming tokens without progressing the state. CoA internalizes this handover, allowing a single model to switch from a 'Plan Agent' role to a 'Search Agent' role seamlessly within one generation stream.
Key Novelty
Chain-of-Agents (CoA) Paradigm & Agent Foundation Models (AFMs)
  • Interleaves reasoning thoughts, tool actions, and 'role' tokens within a single model's context window to simulate multi-agent collaboration end-to-end
  • Uses 'Multi-Agent Distillation' to convert execution trajectories from expert multi-agent systems (like OAgents) into linear training data for a single model
  • Employs progressive filtering and agentic reinforcement learning on verifiable tasks to refine tool orchestration and error correction
Evaluation Highlights
  • +3.8% improvement on GAIA (Level 3) over RL-enhanced WebDancer using a stronger backbone, achieving state-of-the-art 55.3% with Qwen-2.5-32B
  • Reduces inference cost (token consumption) by 84.6% compared to traditional multi-agent systems while maintaining competitive performance
  • Achieves 59.8% solve rate on AIME 2025, outperforming previous TIR methods like SimpleTIR and ReTool by over +10.5%
Breakthrough Assessment
9/10
Significantly advances agentic AI by successfully distilling complex multi-agent dynamics into a single efficient model, achieving SOTA across diverse web and code benchmarks while drastically reducing compute costs.
×