ReAct: Reasoning and Actingโa prompting strategy where LLMs generate reasoning traces and tool actions in an interleaved manner
Chain-of-Thought (CoT): A prompting method that encourages LLMs to generate intermediate reasoning steps before producing a final answer
Chameleon: A tool-augmented LLM method that uses a controller to compose tools for solving subtasks
Hallucination: When an LLM generates plausible but incorrect or ungrounded information
Reference Corpora: External datasets (text, tables, graphs) provided in ToolQA that contain the ground-truth information needed to answer questions
Programmatic Answer Generation: The process of generating ground-truth answers by running code (operators) that simulates correct tool usage on the reference data
ToolQA: The specific benchmark dataset introduced in this paper
GSM8K: A dataset of grade school math word problems, used here as a source for mathematical reference data