Chain of thought: A series of intermediate natural language reasoning steps that lead to the final output
Few-shot prompting: Providing a language model with a few input-output examples in the context window to guide its behavior for a new task
Emergent ability: A capability that is not present in smaller models but appears suddenly as the model scale increases past a certain threshold
GSM8K: Grade School Math 8K—a benchmark of high-quality grade school math word problems
Greedy decoding: A generation strategy where the model selects the highest probability token at each step
PaLM: Pathways Language Model—a large dense language model developed by Google with up to 540 billion parameters
LaMDA: Language Models for Dialog Applications—a family of Transformer-based models specialized for dialog
Codex: A GPT-3 variant fine-tuned on code, capable of code generation and reasoning
Standard prompting: The traditional few-shot method where exemplars consist only of input-output pairs without intermediate steps
OOD: Out-of-Distribution—evaluating on data different from the training or exemplar distribution (e.g., longer sequences than seen in examples)