short-m@k: An inference method that runs k parallel generations but stops all processes as soon as m generations complete, voting among those m outputs.
thinking tokens: Intermediate tokens generated by a reasoning model (often enclosed in <think> tags) before the final answer.
pass@k: A metric measuring the probability that at least one correct answer exists within k generated samples.
majority voting: An aggregation method that generates multiple samples and selects the most frequent final answer (also known as Self-Consistency).
test-time compute: The amount of computational resources (FLOPs, tokens) used during inference to generate a response, often scaled by generating more tokens or samples.
parallel decoding: Generating multiple sequences simultaneously using batching, as opposed to sequential generation.
SFT: Supervised Fine-Tuning—training a model on labeled examples of inputs and desired outputs.
RL: Reinforcement Learning—training method where models learn from rewards/penalties.
backtracking: When a reasoning chain revisits previous steps or attempts to correct itself, often indicating difficulty or error.