← Back to Paper List

EvoFlow: Evolving Diverse Agentic Workflows On The Fly

Gui-Min Zhang, Kaijie Chen, Guancheng Wan, Heng Chang, Hong Cheng, Kun Wang, Shuyue Hu, Lei Bai
Tongji University, Wuhan University, Tsinghua University, The Chinese University of Hong Kong, Nanyang Technological University, Shanghai AI Laboratory
arXiv.org (2025)
Agent Reasoning Benchmark

📝 Paper Summary

Automated Design of Agentic Systems (ADAS) Evolutionary Algorithms for Agents
EvoFlow automates the design of agentic systems by evolving a diverse population of heterogeneous workflows that trade off cost and performance, rather than searching for a single complex optimal architecture.
Core Problem
Existing automated agentic design pipelines typically optimize for a single objective (performance), resulting in homogenous, expensive, and overly complex workflows that lack adaptability to simpler queries.
Why it matters:
  • Current methods produce 'one-size-fits-all' expensive workflows (often using only GPT-4) even for simple tasks
  • Ignoring LLM heterogeneity wastes the potential of smaller, cheaper models (e.g., Llama-3-70b) which can handle many subtasks effectively
  • Real-world queries vary in difficulty; always using a complex multi-agent debate system is inefficient and costly
Concrete Example: For a simple query like 'What is 2+2?', existing methods might invoke a complex Multi-agent Debate workflow costing many tokens. Ideally, the system should route this to a simple I/O agent, while reserving complex Debate/Reflexion workflows for graduate-level math problems.
Key Novelty
Niching Evolutionary Algorithm for Heterogeneous Agent Workflows
  • Treats workflow search as a multi-objective optimization problem (cost vs. performance) to generate a Pareto set of solutions rather than one single best workflow
  • Evolves 'operator nodes' (composite agent units like Debate or CoT) rather than just single prompts, allowing for topological structural search
  • Uses 'niching' to maintain population diversity, ensuring the system keeps simple/cheap workflows for easy tasks and complex/expensive ones for hard tasks
Architecture
Architecture Figure Figure 3
The complete EvoFlow framework, illustrating the evolutionary cycle from population initialization to niching selection.
Evaluation Highlights
  • +11.41% accuracy improvement on MATH benchmark compared to vanilla GPT-4o-mini
  • Outperforms state-of-the-art automated baseline AFlow by 6.42% on MATH while reducing inference cost by ~80%
  • Surpasses o1-preview performance on MATH using only open-source models (Llama-3.1, Qwen-2.5, etc.) at 12.4% of the inference cost
Breakthrough Assessment
8/10
Significant shift from single-objective to multi-objective optimization in agent design. Demonstrates that open-source model ensembles can beat proprietary SOTA models (o1-preview) efficiently.
×