CodeForces: A competitive programming platform with a rating system where participants solve algorithmic puzzles; ratings >2400 are generally considered Grandmaster level
IOI: International Olympiad in Informatics—the most prestigious annual algorithmic competition for secondary school students
RL: Reinforcement Learning—training models by providing rewards for correct behaviors (in this case, correct code execution) rather than just mimicking human text
Chain-of-Thought: A technique where the model generates intermediate reasoning steps before producing the final answer
Test-time compute: The amount of computational resources (time/tokens) a model uses during inference to refine its answer, often via sampling many solutions or long reasoning chains
Subtask: A part of a competitive programming problem with looser constraints (e.g., smaller input size) that awards partial points
Clustering: A strategy used in o1-ioi to group generated programs based on their behavior on test cases to select the most representative/likely correct solution
Elo rating: A relative scoring system used in CodeForces to rank player skill levels