PTQ: Post-Training Quantization—reducing model size after training without full re-training
OPTQ: A quantization algorithm that quantizes weights one by one, updating remaining weights to compensate for error using second-order (Hessian) information
Perplexity: A measurement of how well a probability model predicts a sample; lower values indicate better performance
Hessian matrix: A square matrix of second-order partial derivatives, used here to measure the sensitivity of the loss function to weight changes
ALS: Alternating Least Squares—an optimization algorithm that iteratively solves for one variable while holding others fixed
Affine transformation: A linear mapping method (y = Ax + b) used in layers like LayerNorm; Z-FOLD fuses parameters into these values
Kronecker product: A matrix operation used to approximate the Hessian matrix for quantization
LayerNorm: Layer Normalization—a technique to normalize the inputs across the features, containing trainable scale (gamma) and shift (beta) parameters