← Back to Paper List

Procedural Environment Generation for Tool-Use Agents

Michael Sullivan, Mareike Hartmann, Alexander Koller
Saarland University, Saarbrücken, Germany
arXiv (2025)
Agent RL Benchmark

📝 Paper Summary

Synthetic Data Generation RL-based Agent Training
RandomWorld procedurally generates unlimited interactive, compositional tool-use environments by constructing type-constrained tool execution traces first and deriving instructions from them, enabling scalable online reinforcement learning.
Core Problem
Training effective tool-use agents via online RL requires massive amounts of interactive, compositional environments, but existing datasets are either static (non-callable), too simple (single-step), or manually crafted and thus unscalable.
Why it matters:
  • Online RL significantly improves agent generalization compared to SFT, but requires interactive environments that are dangerous or costly to build in the real world
  • Existing large datasets (e.g., ToolBench) often have high latency or non-interactive training sets, preventing effective RL loops
  • Hand-crafted benchmarks like AppWorld are high-quality but too small (e.g., only 750 tasks) for large-scale training
Concrete Example: A dataset like APIBench might contain thousands of tools but only asks the agent to make a single call, failing to teach the non-linear chaining required to 'find a comedy movie on Netflix under two hours and email the showtimes to a friend'.
Key Novelty
RandomWorld Procedural Generation Pipeline
  • Reverses the standard generation order: instead of generating a query and then solving it, RandomWorld first generates a valid 'trajectory skeleton' (chain of tool calls) using a strict type system, then populates the environment values, and finally generates the instruction
  • Uses a fine-grained type hierarchy (e.g., separating 'movie-title' from 'string') to ensure synthesized tools are composable and inputs/outputs are semantically consistent without manual coding
Architecture
Architecture Figure Figure 1
The RandomWorld generation pipeline flow
Evaluation Highlights
  • Sets new SoTA on two metrics for the NESTFUL benchmark (specific numbers not in provided text)
  • Demonstrates that downstream agent performance scales with the amount of RandomWorld-generated training data
  • Generates environments with greater depth (tool diversity) and non-linear compositionality compared to existing procedural baselines
Breakthrough Assessment
8/10
Addresses the critical data bottleneck for agentic RL by automating the creation of interactive, consistent environments. The 'skeleton-first' generation approach cleverly guarantees solvability.
×