ReAct: Reason+Act: A prompting paradigm where LLMs generate both reasoning traces and task-specific actions in an interleaved manner.
Chain-of-Thought (CoT): A prompting method where models generate intermediate reasoning steps before the final answer.
Act-only: A baseline where the model generates actions directly based on observations without explicit reasoning traces.
Hallucination: When a language model generates plausible-sounding but factually incorrect information.
ALFWorld: A synthetic text-based game benchmark requiring embodied reasoning and multi-step planning in a household environment.
WebShop: A benchmark simulating an online shopping website where agents must follow instructions to find and buy products.
CoT-SC: Chain-of-Thought with Self-Consistency—sampling multiple reasoning paths and taking the majority vote answer.
Zero-shot / Few-shot: Providing the model with zero or a few examples of the task in the prompt to guide its behavior.
Imitation Learning (IL): Training an agent to mimic expert demonstrations.
Reinforcement Learning (RL): Training an agent to maximize a reward signal through trial and error.