← Back to Paper List

AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning

Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Juntao Tan, Thai Hoang, Liangwei Yang, Yihao Feng, Zuxin Liu, T. Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong
Salesforce Research
arXiv.org (2024)
Agent Benchmark Reasoning

📝 Paper Summary

LLM-augmented Autonomous Agents (LAAs) Multi-agent orchestration Agent architecture comparison
BOLAA orchestrates multiple specialist agents (e.g., search-only and click-only) under a central controller to outperform single generalist agents on complex decision-making tasks.
Core Problem
Current investigations into LLM-augmented agents (LAAs) lack comprehensive comparisons of architectures (like ReAct vs. Planning) and struggle to scale single agents to complex open-domain tasks due to context limits and hallucination.
Why it matters:
  • Optimal agent architecture remains undetermined, with limited understanding of how different LLM backbones perform across different agent designs.
  • Single agents handling multiple action types (reasoning, searching, clicking) often fail in complex environments due to divided attention and context constraints.
  • Existing benchmarks often fail to jointly evaluate the interplay between agent architecture and the underlying LLM backbone.
Concrete Example: In a web navigation task, a single agent must decide whether to 'click' a button or 'search' for a query. A generalist agent might hallucinate a click action when it should search. BOLAA splits this into a 'search agent' and 'click agent', ensuring each focuses only on its specific action type.
Key Novelty
BOLAA (Multi-Agent Orchestration with Controller)
  • Decouples complex tasks into distinct labor agents (e.g., one for searching, one for clicking) managed by a central controller.
  • The controller selects the most relevant labor agent for the current state and manages communication, rather than a single LLM trying to handle all action types.
  • Provides a unified benchmark comparing 6 distinct agent architectures (ZeroShot, ReAct, PlanAct, etc.) across multiple open-source and proprietary LLMs.
Architecture
Architecture Figure Figure 3
The BOLAA architecture diagram showing the Controller and Labor Agents Pool.
Evaluation Highlights
  • BOLAA achieves highest rewards on WebShop decision-making tasks compared to 5 other architectures (ReAct, PlanAct, etc.), especially with high-performing LLMs.
  • BOLAA with a smaller 3B model (fastchat-t5-3b) performs comparably to single-agent architectures using much larger models, demonstrating the efficiency of specialized orchestration.
  • Llama-2-70b performs best under the BOLAA architecture, while Llama-2-13b favors PlanAct, showing that optimal architecture depends on model size.
Breakthrough Assessment
7/10
Provides a valuable, comprehensive benchmark of agent architectures often taken for granted. The proposed BOLAA architecture validates the 'mixture of experts/agents' intuition for complex tasks.
×