PTQ: Post-Training Quantization—reducing model precision after training without full re-training
FP8: 8-bit Floating Point format, specifically E4M3 (4 exponent, 3 mantissa) or E5M2 (5 exponent, 2 mantissa) in this paper
FP4: 4-bit Floating Point format, specifically E2M1 (2 exponent, 1 mantissa) in this paper
W4A8: Quantization scheme using 4-bit weights and 8-bit activations
LoRC: Low Rank Compensation—an error correction method that uses low-rank matrix decomposition to approximate and subtract quantization errors
PPL: Perplexity—a metric for measuring how well a probability model predicts a sample; lower values indicate better performance
FGQ: Fine-Grained Quantization—applying quantization parameters at a granular level (e.g., per group of weights) rather than per tensor
outliers: Extreme values in activation distributions that skew uniform quantization ranges, causing loss of precision for the majority of data