DFSDT: Depth-First Search-based Decision Tree—a reasoning strategy allowing the LLM to explore multiple reasoning branches and backtrack if a path fails, used here for data annotation and inference.
RESTful API: Representational State Transfer API—a standard architectural style for web APIs using HTTP requests to access and use data.
ReACT: Reasoning and Acting—a paradigm where LLMs generate reasoning traces and task-specific actions in an interleaved manner.
CoT: Chain-of-Thought—prompting LLMs to generate intermediate reasoning steps before the final answer.
ToolBench: The instruction-tuning dataset constructed in this paper, containing API documentation, instructions, and solution paths.
ToolEval: The automatic evaluation framework developed in this paper, measuring pass rates and win rates against baselines.
OOD: Out-of-Distribution—data that differs significantly from the training data (e.g., unseen APIs).