CoT: Chain-of-Thought—a prompting technique where models generate intermediate reasoning steps before the final answer
SFT: Supervised Fine-Tuning—training a pre-trained model on a specific dataset of instructions and responses
Teacher-Student Distillation: Using a larger, stronger model (teacher) to generate training data for a smaller model (student)
OpenMath CoT: A concise reasoning format proposed in this paper that removes excessive verbiage found in standard Llama CoT traces
Nucleus Sampling: A decoding strategy that samples from the smallest set of top tokens whose cumulative probability exceeds a threshold p
Fair Downsampling: A sampling method ensuring all unique questions are represented as equally as possible when reducing dataset size
Decontamination: The process of removing training examples that are too similar to test set benchmarks to prevent unfair evaluation
Rejection Sampling: Generating multiple solutions and keeping only those that reach the correct final answer