MCP: Model Context Protocol—an open standard that enables LLMs to connect to external data and tools (servers) via a standardized interface
Trajectory: A sequence of steps taken by an agent, typically including reasoning (thought), a tool call, the tool's output, and a final answer
SFT: Supervised Fine-Tuning—training a pre-trained model on a specific dataset to improve its performance on a target task
BFCL: Berkeley Function Calling Leaderboard—a benchmark for evaluating the ability of LLMs to invoke software functions correctly
Edge case: Rare or difficult scenarios, such as when a requested tool is unavailable or returns an error
Hallucination: When an LLM generates incorrect or fabricated information, such as inventing a tool that doesn't exist
Pareto frontier: The set of optimal solutions where no improvement can be made to one objective without degrading another; used here to describe trade-offs in benchmark performance