Semantic Context (SC): The collection of natural language descriptions (e.g., docstrings) for all currently available tools, used to represent actions
SC-LinUCB: A variation of the LinUCB bandit algorithm that uses semantic embeddings of tool descriptions as action features
Regret: The difference between the total reward the agent could have gotten by acting optimally and the reward it actually received
Catastrophic Forgetting: The tendency of a neural network or learning algorithm to completely lose previously learned knowledge when learning new information
FiReAct: Filter-Reason-Act: A pipeline proposed in this paper that filters a large toolset via retrieval before using an LLM to reason and select the final tool
One-hot encoding: A representation where each item (tool) is a vector with a single '1' and all other zeros; implies no shared meaning between items
LinUCB: Linear Upper Confidence Bound—a bandit algorithm that assumes rewards are a linear function of context features and selects actions to maximize an upper confidence bound on the reward
In-context learning (ICL): The ability of LLMs to learn tasks from examples or instructions provided in the prompt without parameter updates
Effective noise: A term in regret bounds summarizing observation noise and model approximation error; lower effective noise implies faster learning