CoA: Chain-of-Agents—a paradigm where a single model simulates multi-agent collaboration by dynamically activating different agent roles (e.g., Plan Agent, Search Agent) within one inference stream
AFM: Agent Foundation Model—the resulting model trained via CoA that supports native end-to-end complex problem solving
TIR: Tool-Integrated Reasoning—models trained to explicitly use tools (think-action-observation) but typically limited to a single agent perspective
Multi-Agent Distillation: The process of recording trajectories from a complex multi-agent system (like OAgents) and converting them into a linear sequence for supervised fine-tuning
DAPO: Dynamic Sampling Policy Optimization—an RL algorithm used here to optimize the agent policy
ReAct: Reasoning and Acting—a framework where LLMs generate reasoning traces and task-specific actions in an interleaved manner
OAgents: A state-of-the-art open-source multi-agent framework used as the 'teacher' system for generating distillation data
Pass@1: A metric measuring the percentage of problems where the model's first attempt is correct
GRM: Generative Reward Model—used here to assess credibility scores for error-correction filtering