Bias Manifold: The local linear subspace spanned by the dominant principal components of the model's generated hidden states, representing safe/heuristic behaviors
Null Space: The geometric region orthogonal to the bias manifold containing high-complexity reasoning paths usually inaccessible to greedy search
Effective Rank: A continuous measure of matrix dimensionality derived from the Shannon entropy of singular values, used here as a proxy for reasoning complexity
Spectrum Contraction: The phenomenon where RL optimization reduces the effective rank of generated trajectories, trapping the model in low-dimensional patterns
SOE: Spectral Orthogonal Exploration—a method to generate synthetic data by projecting probes into the null space of a teacher model
GRPO: Group Relative Policy Optimization—an RL algorithm that normalizes advantages within a group of samples for the same input to reduce variance
Cold Start: The initial supervised fine-tuning phase using high-quality synthetic data to initialize the policy before RL
SFT: Supervised Fine-Tuning