SVD: Singular Value Decomposition—a method to break a matrix down into rotation (U, V) and scaling (Sigma) components
Post-training: The stage after pre-training where models are fine-tuned for specific behaviors, including instruction tuning and distillation
Long-CoT Distillation: Training a smaller model to reason by imitating the long 'chain-of-thought' reasoning outputs of a larger, reinforcement-learning-trained model
SVSM: Singular Value Scaling Matrix—a matrix defined by the authors to quantify the ratio of singular values between a post-trained model and its base model across layers
Frobenius Norm: A measure of the 'size' or magnitude of a matrix, calculated as the square root of the sum of the absolute squares of its elements
Base Model: The pre-trained model before any specific alignment or fine-tuning (e.g., Qwen2.5-Math-1.5B)
Post Model: The model resulting from post-training the Base Model (e.g., Qwen2.5-Math-1.5B-Instruct)