anisotropy: The property where activation energy is concentrated in a few specific directions rather than being distributed uniformly
mean bias: A coherent, non-zero feature-wise mean component in activations that shifts token representations in a common direction
rank-one: A matrix or component that can be represented as the outer product of two vectors; here, the mean bias acts as a single dominant direction
FP4: A 4-bit floating-point data format used for compressing model weights and activations to reduce memory and compute costs
blockwise quantization: Dividing a tensor into small blocks and assigning a separate scaling factor to each block to handle varying value ranges
GeMM: General Matrix Multiply—the fundamental operation in neural network layers
BF16: Brain Floating Point 16—a 16-bit format with a wide dynamic range, commonly used as the high-precision baseline for training
outlier: Activation values with extreme magnitudes that significantly exceed the standard distribution, distorting quantization scales
SVD: Singular Value Decomposition—a mathematical method to factorize a matrix into singular vectors and values, often used to identify dominant directions