← Back to Paper List

Multi-View Encoders for Performance Prediction in LLM-Based Agentic Workflows

Patara Trirat, Wonyong Jeong, Sung Ju Hwang
Korea Advanced Institute of Science and Technology
arXiv (2025)
Agent Benchmark Pretraining

📝 Paper Summary

Automated Generation of Agentic Workflows Performance Prediction for Agents
Agentic Predictor accelerates the design of multi-agent systems by using a lightweight neural network to estimate the performance of candidate workflows without running expensive LLM-based evaluations.
Core Problem
Finding optimal configurations for agentic workflows (e.g., topology, prompts, tools) is computationally prohibitive because evaluating each candidate requires expensive, slow, and repeated execution of Large Language Models.
Why it matters:
  • Developing effective agentic systems currently relies on trial-and-error manual engineering or costly search algorithms that waste computational resources validating poor candidates.
  • Existing automated methods (like GPTSwarm or ADAS) incur massive API costs by fully executing every candidate workflow during the search process.
  • Labeled data for agentic workflows (successful vs. failed runs) is extremely scarce, making it difficult to train standard supervised predictors.
Concrete Example: When designing a coding agent, a search algorithm might generate thousands of variations in communication patterns (e.g., 'Debate' vs. 'Code-Review'). Evaluating all of them requires running GPT-4 on a benchmark for every single variation, costing hundreds of dollars. Agentic Predictor estimates the success rate of these variations instantly without execution.
Key Novelty
Agentic Predictor (Multi-View Encoder + Cross-Domain Pretraining)
  • Encodes workflows using three complementary views: the 'Graph View' for agent topology, the 'Code View' for logic/control flow, and the 'Prompt View' for semantic instructions.
  • Uses cross-domain unsupervised pretraining to learn generalizable workflow representations from unlabeled data, allowing the predictor to work effectively even with very few ground-truth performance labels.
Evaluation Highlights
  • Improves prediction accuracy by up to 6.90% over strong graph-based baselines (averaged across three domains).
  • Increases workflow utility (ranking quality) by up to 5.87% compared to baselines.
  • Outperforms GNN-based baselines like Graph Transformer and GIN in predicting the success of unseen agentic workflows.
Breakthrough Assessment
7/10
Novel application of Neural Architecture Search (NAS) principles to Agentic Workflows. The multi-view encoding and pretraining strategy effectively addresses the unique heterogeneity and data scarcity of agent systems.
×