← Back to Paper List

Unlocking Implicit Experience: Synthesizing Tool-Use Trajectories from Text

Zhihao Xu, Rumei Li, Jiahuan Li, Rongxiang Weng, Jingang Wang, Xunliang Cai, Xiting Wang
Renmin University of China
arXiv.org (2026)
Agent Benchmark

📝 Paper Summary

Synthetic data generation for agents Tool-use post-training
GEM extracts multi-turn tool-use trajectories from raw text corpora by identifying implicit workflows, synthesizing corresponding tools, and generating grounded user-agent interactions without requiring predefined APIs.
Core Problem
Training autonomous agents requires diverse, realistic multi-turn tool-use data, but existing methods rely on expensive, limited sets of predefined APIs, restricting generalization.
Why it matters:
  • Real-world tool-use trajectories are scarce and hard to collect manually.
  • Simulation methods based on fixed API sets fail to cover the broad range of scenarios needed for agents to generalize to unseen environments.
  • Current LLMs struggle with realistic multi-turn interactions involving ambiguous instructions or long-context dependencies.
Concrete Example: A raw text about 'hospital reimbursement claims' contains an implicit procedure (step-by-step logic) and implicit tools (forms, submissions) but is not structured as an agent trajectory. Existing methods miss this data source, whereas GEM extracts the workflow to create a simulated reimbursement agent.
Key Novelty
Text-Based Extraction Paradigm (GEM Pipeline)
  • Treats raw text corpora as a source of implicit 'experience' containing procedural knowledge, rather than just knowledge facts.
  • Synthesizes *both* the tools (APIs) and the interaction trajectories simultaneously from text, bypassing the need for a pre-existing tool library.
  • Distills the multi-stage generation pipeline into a single 'Trajectory Synthesizer' model that converts text to agent data end-to-end.
Architecture
Architecture Figure Figure 4
The GEM data synthesis pipeline, illustrating the flow from raw text to validated trajectories.
Evaluation Highlights
  • +16.5% improvement on the BFCL V3 Multi-turn benchmark using GEM-32B compared to baselines.
  • Achieves comparable performance on Tau-bench (Airline and Retail) using out-of-domain synthetic data as models trained on in-domain data, showing strong generalization.
  • The distilled Trajectory Synthesizer matches the quality of the full multi-stage pipeline while significantly reducing inference costs.
Breakthrough Assessment
8/10
Proposes a significant paradigm shift from tool-centered simulation (needing fixed APIs) to text-centered extraction (creating APIs from text). The generalization results on Tau-bench are particularly impressive.
×