DFSDT: Depth-First Search-based Decision Tree—a reasoning strategy where the model explores different action paths (branches) and backtracks if a path fails, rather than following a single linear chain
ReACT: Reasoning and Acting—a prompting technique where models generate a thought trace before taking an action
CoT: Chain-of-Thought—a prompting method encouraging models to break down problems into intermediate reasoning steps
SFT: Supervised Fine-Tuning—training a pre-trained model on labeled examples (instruction-response pairs) to follow instructions
API Retriever: A module that selects a small subset of relevant APIs from a massive pool based on the user's instruction
ToolEval: The automatic evaluation framework proposed in this paper, using ChatGPT as a judge to measure pass rates and win rates
REST API: Representational State Transfer API—a standard architectural style for web services allowing communication via HTTP methods (GET, POST, etc.)
Win Rate: The percentage of times an evaluator (ChatGPT) prefers the model's solution over a baseline solution
Pass Rate: The percentage of instructions for which the model successfully executes a valid sequence of actions to reach a solution
OOD: Out-Of-Distribution—refers to testing the model on data (APIs or instructions) it was not exposed to during training