PTQ: Post-Training Quantization—converting a pre-trained model to lower bit-width integers without full retraining, using only a small calibration dataset
ViT: Vision Transformer—a model architecture applying Transformer self-attention mechanisms directly to sequences of image patches
LayerNorm: Layer Normalization—a technique to normalize neuron activities, known in ViTs for having high variance across channels
GeLU: Gaussian Error Linear Unit—an activation function used in ViTs that has an asymmetric distribution (positive range wider than negative)
Hessian guided metric: A method to determine optimal quantization parameters by considering the curvature of the loss function (using Hessian info) to minimize impact on final loss
Bit sparsity: The observation that in non-normal distributions, certain bits (MSB or LSB) are often unused or redundant, allowing for compression or specialized scaling
Fully quantized: A model where all operations, including complex non-linearities like Softmax and Normalization, are executed using integer arithmetic