BFCL: Berkeley Function-Calling Leaderboard—a benchmark dataset for evaluating the ability of LLMs to invoke external tools
AST: Abstract Syntax Tree—a tree representation of code structure used here to strictly evaluate if the predicted tool call matches the ground truth syntactically
Schema Constraints: Rules defining valid tool usage, including function names, required argument keys, and data types (string, integer, etc.)
CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps before the final answer
SFT: Supervised Fine-Tuning—training a pre-trained model on a specific labeled dataset to adapt it for a particular task
BM25: Best Matching 25—a probabilistic information retrieval algorithm used to rank documents (or tools) based on keyword matching
Anchor Grouping: A strategy to split a large list of tools into smaller subsets, ensuring the most relevant tools are distributed across groups to avoid missing them
Consistency Validator: A module that checks if a generated tool call strictly adheres to the defined API schema (name, arguments, types)