SFT: Supervised Fine-Tuning—training a pre-trained model on specific input-output pairs to adapt it to a task.
Distillation: Training a smaller student model to mimic the outputs of a larger, more capable teacher model.
Chain-of-Thought: A reasoning technique where models generate intermediate steps before producing a final answer.
DeepSeek-R1: A strong reasoning model used as a teacher for generating reasoning traces.
QwQ-32B: A reasoning model from the Qwen team, found in this paper to be a superior teacher despite lower benchmark scores.
AIME: American Invitational Mathematics Examination—a challenging math benchmark.
GPQA: Graduate-Level Google-Proof Q&A—a difficult science and reasoning benchmark.
fastText: A library for efficient text classification and representation learning.
LiveCodeBench: A benchmark for evaluating code generation capabilities, specifically on contest problems.