Angle Concentration: The cosine similarity between token hidden state vectors; high concentration means vectors are directionally similar, which correlates with larger gradient norms
GRPO: Group Relative Policy Optimization—a reinforcement learning algorithm for LLMs that normalizes rewards within a group of outputs to stabilize training
RFT: Reinforcement Fine-tuning—using RL algorithms (like PPO or GRPO) to improve model performance after initial supervised training
Pre-filling: The initial phase of LLM inference where the prompt is processed to generate key-value caches; much faster than generating new tokens (decoding)
Hidden States: The internal vector representations of tokens within the neural network layers
Frobenius Norm: A measure of the magnitude of a matrix (square root of the sum of the absolute squares of its elements), used here to quantify gradient size
SiLU: Sigmoid Linear Unit—an activation function used in modern LLMs (like Llama and Qwen)
Intra-segment concentration: The similarity of hidden states within a specific part of the input (e.g., within the question text itself)
Inter-segment concentration: The similarity of hidden states between different parts of the input (e.g., between the system prompt and the question)