LRM: Large Reasoning Model—models like OpenAI-o1 or DeepSeek-R1 capable of complex multi-step reasoning
ZPD: Zone of Proximal Development—educational theory defining the gap between what a learner can do unaided and what they can do with guidance
Rejection Sampling: Generating multiple outputs from a model and keeping only those that yield the correct final answer
Entropy: A measure of uncertainty in the model's next-token prediction distribution
Perplexity (PPL): A metric measuring how surprised a model is by a sequence of text; lower PPL means the text is more predictable/natural to the model
SFT: Supervised Fine-Tuning—training a model on a labeled dataset
NLL: Negative Log-Likelihood—a loss function penalizing the model for assigning low probability to the correct token
Teacher Ceiling: The performance limit imposed on a student model because the teacher cannot generate valid training data for problems beyond its own unassisted capability
Hindsight Hint: Providing the ground-truth answer or intermediate steps to the model to guide it toward a correct solution it couldn't find independently