PTQ: Post-Training Quantization—compressing a model after training using only a small calibration set, without full retraining.
AdaLog: Adaptive Logarithm Quantizer—the proposed method that learns an optimal base for logarithmic quantization.
FPCS: Fast Progressive Combining Search—a search strategy that iteratively refines the search grid for quantization parameters.
Bias Reparameterization: A technique to absorb quantization errors or shift distributions (like making GELU outputs non-negative) by adjusting bias terms.
Power-law distribution: A distribution where frequency decreases as a power of the value; common in Softmax/GELU outputs, having 'long tails'.
Softmax: Activation function that converts logits to probabilities; in ViTs, these outputs often have a power-law distribution.
GELU: Gaussian Error Linear Unit—activation function used in ViTs; outputs are mostly non-negative but have a small negative tail.
De-quantization: The process of mapping integer indices back to approximate real values (or performing operations that simulate this).