← Back to Paper List

Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models

Cheng-Yu Hsieh, Si-An Chen, Chun-Liang Li, Yasuhisa Fujii, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister
University of Washington, National Taiwan University, Google Cloud AI Research, Google Research
arXiv (2023)
Agent MM Benchmark

📝 Paper Summary

Multi-call tool use with fixed plan Invoking internalized APIs
LLMs can effectively use new tools zero-shot by reading documentation (manuals) rather than relying on few-shot demonstrations, enabling scalable use of hundreds of unseen APIs.
Core Problem
Current LLM tool-use relies on few-shot demonstrations, which are hard to acquire, difficult to select without bias, and combinatorially intractable as the number of available tools scales up.
Why it matters:
  • Selecting the 'right' few-shot demonstrations is difficult and biased selection can degrade performance
  • Providing demonstrations for hundreds of tools exceeds context windows and requires immense manual curation effort
  • Real-world APIs change frequently; maintaining up-to-date demonstrations for every version is impractical compared to using existing documentation
Concrete Example: When using a new cloud CLI tool, an LLM relying on few-shot demos might hallucinate a '-P' flag for port specification based on familiar Linux commands (scp), whereas an LLM reading the documentation correctly identifies the specific '--port' flag required by the new tool.
Key Novelty
Documentation-based Zero-Shot Tool Use
  • Replace few-shot input-output examples (demonstrations) with textual descriptions of tool functionality and usage (documentation) in the prompt
  • Enable 'plug-and-play' usage of completely new tools (e.g., GroundingDINO, Track Anything) by simply pasting their README/docs into the context, without curating specific demos
Architecture
Architecture Figure Figure 2
Contrast between Demonstration-based prompting (Left) and Documentation-based prompting (Right) for tool use.
Evaluation Highlights
  • Zero-shot usage with documentation achieves comparable or better performance than few-shot usage on ScienceQA (79.91 vs 78.54) and TabMWP (92.69 vs 89.28)
  • On a new dataset of 200 unseen Google Cloud CLI tools, documentation-based prompting outperforms few-shot demonstrations by ~2.3x (F1 score 0.45 vs 0.19)
  • Successfully 're-invents' state-of-the-art pipelines like Grounded-SAM and Track Anything zero-shot by combining documentation from constituent tools (SAM, GroundingDINO, XMem)
Breakthrough Assessment
7/10
Strong empirical evidence that documentation is a more scalable alternative to demonstrations for tool use. While the method is simple prompt engineering, the finding significantly lowers the barrier for deploying LLMs with massive, unseen toolsets.
×