← Back to Paper List

Tools are under-documented: Simple Document Expansion Boosts Tool Retrieval

Xuan Lu, Haohang Huang, Rui Meng, Yaohui Jin, Wenjun Zeng, Xiaoyu Shen
Shanghai Jiao Tong University, Ningbo Key Laboratory of Spatial Intelligence and Digital Derivative, Institute of Digital Twin, Eastern Institute of Technology, Ningbo
arXiv (2025)
Agent Benchmark RAG

📝 Paper Summary

Tool retrieval Tool profiling
The paper introduces a pipeline to enrich sparse tool documentation with structured fields (usage scenarios, limitations) and trains specialized dense retrievers and rerankers that significantly outperform baselines.
Core Problem
Tool retrieval fails because documentation is often incomplete, heterogeneous, and semantically misaligned with user queries, leading to a large semantic gap.
Why it matters:
  • Current benchmarks reveal 41.6% of tool documents lack clear functional statements or usage contexts, forcing LLMs to guess parameters.
  • Inconsistent phrasing (e.g., 7 ways to describe the same function) complicates retrieval for ambiguous user queries.
  • Prior work focuses on query expansion or architecture changes, overlooking the root cause: the flawed underlying documentation data.
Concrete Example: In the ToolRet dataset, the same function is described in seven distinct formulations across sources. Some datasets like 'mnms' lack even a basic description field, making it impossible for retrievers to match a user query like 'find a restaurant' to the correct API.
Key Novelty
Tool-DE (Tool-Document Expansion) Framework
  • Systematically enriches raw tool documentation using a low-cost LLM pipeline to generate structured fields: function description, 'when-to-use', limitations, and tags.
  • Creates specialized large-scale training corpora (50k for retrieval, 200k for reranking) based on these enriched documents.
  • Trains dedicated models (Tool-Embed and Tool-Rank) specifically optimized for the enriched document structure.
Architecture
Architecture Figure Figure 1
The four-stage pipeline for constructing Tool-DE: Expansion, Judgement, Refinement, and Human Validation.
Evaluation Highlights
  • +10.23 NDCG@10 improvement by Tool-Embed-4B over the MTEB SoTA open-source model (Qwen3-Embedding-8B) on the Tool-DE benchmark.
  • Tool-Rank-4B achieves state-of-the-art performance with 56.44 NDCG@10, improving by +4.21 over the first-stage retriever.
  • Document expansion alone boosts zero-shot performance of sparse retrievers (BM25s) significantly on Recall@10 (+8.69).
Breakthrough Assessment
7/10
Strong practical contribution by addressing the data quality bottleneck in tool retrieval. The proposed pipeline and models show significant gains, though the core technique (LLM-based document expansion) is a known strategy applied to a new domain.
×