Understanding Post-Training Structural Changes in Large Language Models

📝 Paper Summary

LLM Post-training analysis Mechanistic Interpretability Parameter Space Analysis

Post-training adapts LLMs not by reorganizing their internal geometry, but by applying uniform scaling to singular values and consistent orthogonal rotations to singular vectors, preserving the pre-trained semantic space.

Core Problem

While post-training (instruction tuning, reasoning distillation) is essential for LLM performance, its impact on the internal parameter structure remains a black box, with prior work focusing mostly on behavioral outputs or hidden states rather than weight matrices.

Why it matters:

Current understanding of how models acquire new capabilities (like reasoning) is limited to observing outputs, lacking a structural explanation of how parameters evolve
Treating weight matrices as black boxes hinders the development of more efficient fine-tuning or model merging techniques
The lack of structural insight makes it difficult to distinguish between different types of post-training (e.g., standard instruction tuning vs. reasoning distillation) at the parameter level

Concrete Example: When a base model is instruction-tuned, we know it follows commands better, but we don't know if this requires a complete rewiring of its internal connections. This paper shows it doesn't: the singular vectors of the fine-tuned model are just rotated versions of the base model's, meaning the fundamental semantic relationships are preserved rather than destroyed.

Key Novelty

Spectral Analysis of Post-Training Structural Invariants

Discovers that post-training applies a near-uniform geometric scaling factor to all singular values within a layer, rather than altering the distribution shape
Identifies that the output projection matrix (Wo) in attention layers exhibits anomalously high scaling factors specifically in reasoning models, distinguishing them from standard instruction-tuned models
Proves that left and right singular vectors undergo coordinated orthogonal transformations (rotations), effectively preserving the semantic topology established during pre-training

Evaluation Highlights

Demonstrates robust structural regularities across two distinct post-training regimes: Instruction Tuning (Qwen2.5-Instruct) and Long-CoT Distillation (DeepSeek-R1-Distill)
Identifies a unique spectral signature for reasoning models: the Wo matrix in Self-Attention shows distinctively higher singular value scaling compared to instruction-tuned models
Shows that the normalized Frobenius norm of the orthogonality gap between vector transformations is consistently low, proving the transformations are essentially rigid rotations

Breakthrough Assessment

7/10

Provides the first concrete mathematical laws (uniform scaling + coordinated rotation) describing post-training parameter evolution. While primarily analytical, it offers a strong theoretical foundation for understanding alignment mechanics.

⚙️ Technical Details

Problem Definition

Setting: Analysis of weight matrix transformations between pre-trained Base models and Post-trained (Instruct/Reasoning) models

Inputs: Weight matrices (W) from Self-Attention and FFN layers of LLMs

Outputs: Singular Value Scaling Matrices (SVSM) and Similarity Matrices characterizing the geometric transformation

Pipeline Flow

Model Loading (Base & Post)
Weight Extraction (Attention & FFN matrices)
SVD Decomposition
Metric Calculation (Scaling Factors, Vector Similarity)

System Modules

SVD Analysis

Decompose weight matrices W into U, Sigma, V^T to isolate scaling and directional components

Scaling Analysis

Compute the ratio of singular values between post and base models to identify scaling laws

Rotation Analysis

Measure the similarity and orthogonality of the transformation matrices relating base and post singular vectors

Novel Architectural Elements

Analytical Framework: A mathematical formalism describing post-training as a combination of uniform geometric scaling (temperature control) and coordinated orthogonal rotation (semantic alignment)

Modeling

Base Model: Qwen2.5-Math-1.5B

Training Method: Analysis focuses on existing post-trained models: Instruction Tuning (supervised fine-tuning on instructions) and Long-CoT Distillation (distilling reasoning traces)

Adaptation: Analysis of Full Fine-tuning results (as present in public models)

Trainable Parameters: All parameters (in the analyzed public models)

Training Data:

Not reported in the paper (Analysis relies on pre-released weights)

Compute: Not reported in the paper

Comparison to Prior Work

vs. PiSSA: PiSSA assumes singular values change significantly or need re-optimization; this paper finds singular values undergo uniform scaling, suggesting the distribution shape is invariant
vs. DoRA: DoRA separates magnitude and direction explicitly; this paper confirms structurally that post-training naturally disentangles into scaling (magnitude) and rotation (direction) components
vs. Mechanistic Interpretability [not cited in paper]: Traditional interp focuses on circuits/neurons; this approach looks at global linear algebra properties of entire weight matrices

Limitations

Analysis is limited to supervised post-training (Instruction Tuning, Distillation); RLHF/PPO not deeply analyzed in the main text (mentioned briefly in appendix)
Focuses primarily on Qwen and DeepSeek model families; generalization to all architectures (e.g., Mixture-of-Experts) is not fully explored in main text
Descriptive rather than prescriptive: explains *what* happens to parameters but does not propose a new training method based on these findings

Reproducibility

The paper analyzes publicly available model weights (Qwen2.5-Math-1.5B, Qwen2.5-Math-1.5B-Instruct, DeepSeek-R1-Distill-Qwen-1.5B). The analysis method (SVD) is standard, but no specific code repository is provided for the analysis scripts.

📊 Experiments & Results

Evaluation Setup

Comparative spectral analysis of weight matrices between Base and Post-trained models

Benchmarks:

Qwen2.5-Math-1.5B Family (Model Weight Analysis)
DeepSeek-R1-Distill-Qwen-1.5B (Model Weight Analysis)

Metrics:

Singular Value Scaling Factor (alpha)
Normalized Frobenius Norm (NF) of Orthogonality Gap
Similarity Matrix heatmaps
Statistical methodology: Not explicitly reported in the paper

Main Takeaways

Post-training preserves the relative distribution of singular values found in the base model, applying only a uniform scaling factor (alpha) across the spectrum.
Reasoning models (Long-CoT distilled) can be distinguished from standard Instruct models by a distinctively larger scaling factor in the Output Projection (Wo) matrices of attention layers.
The change in singular vectors (both left and right) between Base and Post models is well-approximated by a single orthogonal rotation matrix, implying the semantic space is rotated but not distorted.
The 'similarity matrix' between Base and Post singular vectors is nearly identical for U (left) and V (right) vectors, confirming coordinated transformation of input and output subspaces.

📚 Prerequisite Knowledge

Prerequisites

Linear Algebra (Singular Value Decomposition, Orthogonal Matrices)
Transformer Architecture (Self-Attention, FFN)
LLM Training Pipeline (Pre-training vs. Post-training)

Key Terms

SVD: Singular Value Decomposition—a method to break a matrix down into rotation (U, V) and scaling (Sigma) components

Post-training: The stage after pre-training where models are fine-tuned for specific behaviors, including instruction tuning and distillation

Long-CoT Distillation: Training a smaller model to reason by imitating the long 'chain-of-thought' reasoning outputs of a larger, reinforcement-learning-trained model

SVSM: Singular Value Scaling Matrix—a matrix defined by the authors to quantify the ratio of singular values between a post-trained model and its base model across layers

Frobenius Norm: A measure of the 'size' or magnitude of a matrix, calculated as the square root of the sum of the absolute squares of its elements

Base Model: The pre-trained model before any specific alignment or fine-tuning (e.g., Qwen2.5-Math-1.5B)

Post Model: The model resulting from post-training the Base Model (e.g., Qwen2.5-Math-1.5B-Instruct)