SLM: Small Language Model—used here for efficient tool grounding and preliminary answer generation (e.g., GPT-Neo 1.3B)
LLM: Large Language Model—used here only for the final API call generation and execution (e.g., GPT-3, GPT-J)
grounding score: A linear combination of semantic similarity and pattern similarity used to rank tools
semantic similarity: Cosine distance between the embeddings of the user query and the tool description
pattern similarity: A score measuring how well the format (numbers, dates, text) of a tool's output matches a preliminary zero-shot guess
FLOPS: Floating Point Operations Per Second—a metric used here to quantify computational cost/efficiency
Add-lambda smoothing: A technique to smooth probability distributions by adding a small constant lambda to counts, preventing zero probabilities for unseen patterns