Prefix Self-Consistency: The phenomenon where different solution trajectories (correct or incorrect) for the same question share a common, consistent initial reasoning phase.
UPFT: Unsupervised Prefix Fine-Tuning—the proposed method that trains models on the initial tokens of their own generated reasoning traces without filtering for correctness.
RFT: Rejection Sampling Fine-Tuning—a standard method where a model generates many solutions, correct ones are filtered using ground truth, and the model is fine-tuned on them.
SFT: Supervised Fine-Tuning—training a model on input-output pairs to minimize the difference between generated and target tokens.
Rollout Sampling: Generating the remainder of a sequence from a specific intermediate token position to estimate the likelihood of reaching a correct answer from that state.
Prefix Coverage: The diversity of potential solution paths captured by the initial tokens of generated traces.
Prefix Accuracy: The probability that a given reasoning prefix will lead to a correct final answer.
NLL: Negative Log-Likelihood—a standard loss function used to train language models.
Catastrophic Forgetting: A failure mode where a model loses previously learned capabilities (like instruction following) while learning a new task.