PTQ: Post-Training Quantization—compressing a pre-trained model to lower precision (e.g., 8-bit, 4-bit) without full retraining.
QAT: Quantization Aware Training—simulating quantization during the training process to allow the model to adapt, usually yielding better accuracy but requiring full training resources.
Coordinate Descent: An optimization algorithm that minimizes a function by successively minimizing along coordinate directions (one variable at a time).
Hessian: A square matrix of second-order partial derivatives of a scalar-valued function; commonly used in optimization to determine curvature but expensive to compute/invert.
ViT: Vision Transformer—a model architecture based on the Transformer mechanism applied to image patches.
Bit-code: The integer representation of a weight in a quantized model (e.g., an integer from -8 to 7 for 4-bit signed quantization).
Calibration data: A small set of real data samples used to statistically adjust quantization parameters post-training, without using the full training set.
Greedy selection: Choosing the next variable to update based on which one offers the maximum immediate reduction in the objective function.