Self-instruct: A method where a strong LLM generates instruction-response pairs to create a dataset for fine-tuning other models
Nested tool calling: A scenario where the output of one tool execution is required as an input parameter for a subsequent tool call
ICL: In-Context Learning—prompting an LLM with examples (demonstrations) to guide its generation without updating weights
Hallucination: The tendency of LLMs to generate plausible but incorrect or non-existent facts (e.g., inventing fake tool parameters)
Directed Acyclic Graph (DAG): A graph structure with no loops; used here to describe the dependency flow between multiple tool calls in a single query
Argument F1: A metric evaluating whether the predicted parameter values match the ground truth, balancing precision and recall
Rouge-L: A metric measuring text overlap based on the longest common subsequence, used here to evaluate the similarity of generated tool parameters to reference values