SFT: Supervised Fine-Tuning—retraining a pre-trained model on a smaller, labeled dataset to adapt it for specific tasks
CoT: Chain-of-Thought—a prompting or reasoning style where the model generates intermediate logical steps before producing the final answer
Inference-time computation: The computational work (processing steps/tokens) a model performs while generating an answer, which correlates with reasoning depth
OOD: Out-of-Distribution—test cases that differ significantly from the data seen during training (e.g., different problem types or languages)
AIME: American Invitational Mathematics Examination—a highly challenging high school mathematics competition
Pass@1: An evaluation metric measuring the percentage of problems where the model's first generated answer is correct
Foundation Model: A large-scale model (like Llama or Qwen) pre-trained on vast amounts of data, serving as a base for specific applications
Elicitation: The process of triggering or unlocking capabilities already present in a model's pre-trained weights, rather than teaching new knowledge
Self-Verification: A reasoning step where the model explicitly checks its own intermediate work for errors