← Back to Paper List

AutoML-Agent: A Multi-Agent LLM Framework for Full-Pipeline AutoML

Patara Trirat, Wonyong Jeong, Sung Ju Hwang
DeepAuto.ai, KAIST, Seoul, South Korea
arXiv (2024)
Agent RAG Benchmark

📝 Paper Summary

Multi-agent Automated machine learning (AutoML)
AutoML-Agent automates the entire machine learning pipeline—from data retrieval to model deployment—using specialized LLM agents coordinated via retrieval-augmented planning and multi-stage verification to ensure runnable, high-quality code.
Core Problem
Existing AutoML systems require high technical expertise to configure, while current LLM-based approaches usually handle only isolated pipeline steps (e.g., just HPO or feature engineering) or rely on slow, expensive training-based search.
Why it matters:
  • Current fragmentation leads to suboptimal solutions because decisions in data processing affect model design and vice versa
  • High configuration barriers prevent domain experts without coding skills from building effective ML solutions
  • Training-based search methods are computationally prohibitive for practical, rapid development
Concrete Example: A user asks for a spam detection model. A standard LLM might generate code with hallucinated dependencies or miss the inference latency constraint. A single-step tool might optimize the model architecture but fail to preprocess the text data correctly for that specific architecture, causing runtime errors.
Key Novelty
Retrieval-Augmented Planning with Role-Specific Decomposition
  • Instead of a single plan, the system retrieves external knowledge (like arXiv papers) to generate multiple diverse plans, then decomposes them into sub-tasks for specialized agents (Data Agent, Model Agent)
  • Uses 'prompting-based execution' where agents simulate execution and return expected results/metrics without running code, allowing fast exploration before the final code is written
Architecture
Architecture Figure Figure 2
The complete workflow of AutoML-Agent, detailing the interaction between the User, Agent Manager, and specialized agents (Prompt, Data, Model, Operation).
Evaluation Highlights
  • Achieves 87.1% success rate in generating runnable, compliant pipelines under constraint-aware settings, significantly outperforming GPT-4 (50.7%) and DS-Agent (37.5%)
  • Reduces search time by ~8x compared to tree-search-based methods (SELA) while maintaining comparable or superior model performance
  • Outperforms human expert baselines and AutoGluon on normalized performance scores across 7 downstream tasks including image, text, and tabular data
Breakthrough Assessment
8/10
Significantly advances LLM-based AutoML by successfully integrating the full pipeline (data to deployment) with a practical, training-free search mechanism. The high success rate on complex constraints is a strong differentiator.
×