PTQ: Post-Training Quantization—converting a pre-trained model to lower precision without full retraining, using only a small calibration dataset
Hessian Matrix: A square matrix of second-order partial derivatives of a scalar-valued function, describing the local curvature of the loss landscape
Fisher Information Matrix (FIM): A matrix approximating the Hessian, often used in quantization metrics but reliant on assumptions that may not hold for ViTs
GELU: Gaussian Error Linear Unit—an activation function used in ViTs that is smooth but produces a distribution difficult to quantize due to its negative tail and heavy positive skew
Block-Reconstruction: A PTQ strategy that optimizes quantization parameters block-by-block to minimize the error between the quantized block's output and the original block's output
AdaRound: Adaptive Rounding—a method to determine whether to round weights up or down to minimize task loss, rather than just rounding to the nearest integer
BRECQ: Block Reconstruction Quantization—a state-of-the-art PTQ method for CNNs that uses Hessian-guided metrics
Jacobian Matrix: The matrix of all first-order partial derivatives of a vector-valued function