PTQ: Post-Training Quantization—compressing a model after training without full re-training
L_infinity norm: The maximum absolute value in a vector or matrix
Proximal Gradient Descent: An optimization algorithm for handling non-differentiable objective functions (like L_infinity regularization)
RTN: Rounding-to-Nearest—the simplest quantization method that just rounds values to the nearest grid point
AWQ: Activation-aware Weight Quantization—a method that scales weights based on activation magnitude to protect important weights
OPTQ: Optimal Brain Quantization—a method that quantizes weights one by one, updating remaining weights to compensate for error
Rank-deficient: A matrix property where rows/columns are not linearly independent, implying the system of equations has infinite solutions
Perplexity: A measurement of how well a probability model predicts a sample; lower values indicate better performance