← Back to Paper List

ToolNet: Connecting Large Language Models with Massive Tools via Tool Graph

Xukun Liu, Zhiyuan Peng, Xiaoyuan Yi, Xing Xie, Lirong Xiang, Yuchen Liu, Dongkuan Xu
Northwestern University, North Carolina State University, Microsoft Research Asia
arXiv (2024)
Agent Reasoning Benchmark

📝 Paper Summary

Multi-call tool use with flexible plan Self-evolving Agentic reasoning
ToolNet organizes thousands of tools into a weighted directed graph, allowing LLMs to navigate sparse tool transitions rather than processing the entire tool library every step.
Core Problem
Existing methods like ReAct format all available tools as a flat list in the context, which fails to scale to thousands of tools due to token limits and confuses LLMs.
Why it matters:
  • LLMs hallucinate and fail to select correct tools when presented with massive, flat tool libraries
  • Token consumption scales linearly with tool count, making current in-context learning approaches cost-prohibitive for large-scale real-world APIs
  • Static tool lists cannot adapt to tool failures or updates without manual intervention
Concrete Example: In ToolBench, a task might require a specific sequence of API calls. A standard method inputs 3000+ tool descriptions at every step. ToolNet, realizing that the 'Weather' tool is rarely followed by 'Spotify', only presents the few statistically likely successors, drastically cutting context size.
Key Novelty
Tool Graph Navigation for Tool Selection
  • Represent tools as nodes in a directed graph where weighted edges represent the probability of transitioning from one tool to another
  • Instead of searching the full library, the LLM only chooses from the current tool's 'successor' nodes, significantly reducing the search space
  • Dynamically update edge weights based on success/failure feedback, allowing the system to learn preferred paths and prune broken tools over time
Architecture
Architecture Figure Figure 1
Comparison between conventional In-context Tool Learning and ToolNet. Shows how ToolNet uses a graph structure.
Evaluation Highlights
  • Achieves comparable or better performance than Reflexion on APIBank and ToolBench while using 61.5% and 50.3% fewer tokens respectively
  • +15 points in Exact Match over ReAct on TabMWP (from 0.26 to 0.41 difference depending on variant)
  • Demonstrates resilience to tool failure: when a primary tool breaks, the system dynamically down-weights it and switches to a backup tool within ~20 iterations
Breakthrough Assessment
7/10
Simple but highly effective mechanism for scaling tool use. The graph-based approach solves the context window bottleneck for massive tools elegantly, though reliance on pre-existing trajectories for graph construction is a constraint.
×