← Back to Paper List

Optimizing Agentic Workflows using Meta-tools

Sami Abuzakuk, Anne-Marie Kermarrec, Rishi Sharma, Rasmus Moorits Veski, Martijn de Vos
École Polytechnique Fédérale de Lausanne
arXiv (2026)
Agent Reasoning Benchmark

📝 Paper Summary

Multi-call tool use with flexible plan Agent workflow optimization
AWO identifies recurring sequences of tool calls in agent execution traces and compiles them into deterministic meta-tools, bypassing intermediate LLM reasoning steps to reduce cost and latency.
Core Problem
Agentic workflows often require many iterative reasoning steps and tool invocations, leading to high operational costs, latency, and potential for hallucinations or failures.
Why it matters:
  • Operational expense: Repeated LLM inference for routine sub-tasks drives up token costs significantly.
  • Latency: User-facing applications suffer from the cumulative delay of multiple sequential reasoning-action cycles.
  • Reliability: More intermediate reasoning steps increase the probability of error or hallucination by the LLM.
Concrete Example: Creating a Spotify playlist requires sequential API calls (authorize, create, add items). An agent re-reasons at every step. AWO merges these into one 'create_and_populate_playlist' meta-tool, skipping intermediate reasoning.
Key Novelty
Agent Workflow Optimization (AWO)
  • Analyzes historical execution traces to build a state graph where nodes represent tool histories.
  • Merges similar executions (horizontal merging) and identifies frequent sub-paths (vertical merging) to detect redundant patterns.
  • Compiles these patterns into 'meta-tools'—composite functions that execute multiple steps deterministically—allowing the agent to skip LLM calls for those segments.
Architecture
Architecture Figure Figure 4
The AWO workflow: Trace Mapping -> Horizontal Merging -> Vertical Merging -> Meta-tool Creation.
Evaluation Highlights
  • Reduces the number of LLM calls by up to 11.9% on agentic AI benchmarks.
  • Increases task success rate by up to 4.2 percentage points by shortening execution paths and reducing error opportunities.
  • Identified that over 14.3% of tasks in the AppWorld benchmark follow equivalent trajectories after 5 steps, proving high redundancy.
Breakthrough Assessment
7/10
Solid practical optimization for agentic systems. While not a fundamental architectural shift like ReAct itself, it provides a concrete, data-driven method to reduce cost and latency in production environments.
×