BFCL: Berkeley Function Calling Leaderboard—a comprehensive evaluation set for assessing LLM tool-use capabilities across various coding languages and scenarios
APIBank: A benchmark for evaluating tool-augmented LLMs, divided into levels of difficulty (Level-1 for single/simple calls, Level-2 for multi-turn/complex)
Zero-shot generalization: The ability of the model to use tools/APIs it has never seen during training, relying only on the provided definitions
Hallucination: In this context, when a model invents parameter values not present in the user query or system prompt
Speciation: The initial step in ToolACE's API synthesis where an 'API context tree' is created to define possible domains and functionalities from raw documents
Adaptation: The step where specific functionalities from the context tree are assigned to individual synthetic APIs to ensure distinct capabilities
Evolution: The iterative process of refining and diversifying synthetic APIs (e.g., adding constraints, mutating parameters) based on feedback
Model-based Checker: Using an LLM agent to verify semantic correctness (e.g., consistency, absence of hallucinations) where rule-based checks fail