pass@1: A metric measuring the percentage of problems where the model's first generated solution passes all unit tests.
DeepSeek-R1: A state-of-the-art open-weights reasoning model that uses reinforcement learning to generate long chain-of-thought traces before answering.
SFT: Supervised Fine-Tuning—training a pre-trained model on a labeled dataset of inputs and outputs.
Chain-of-Thought: A prompting/training technique where the model generates intermediate reasoning steps before the final answer.
Tree Sitter: A parser generator tool used to build syntax trees for source code, used here to verify syntactic correctness of generated solutions.
Nucleus Sampling: A text decoding method (top-p) that samples from the smallest set of tokens whose cumulative probability exceeds a threshold p.
SGLang: A structured generation language and engine used for efficient LLM inference.
IOI: International Olympiad in Informatics—a prestigious competitive programming contest, used here as a benchmark for C++ performance.