PTQ: Post-Training Quantization—compressing a trained model's weights to lower precision without re-training from scratch
SQNR: Signal-to-Quantization-Noise Ratio—a measure (in dB) of the relative magnitude of quantization error compared to the original weight magnitude
GPTQ: Generative Pre-trained Transformer Quantization—an algorithm that compresses LLM weights by minimizing layer-wise reconstruction error using second-order (Hessian) information
RTN: Round-to-Nearest—a simple quantization baseline that rounds weights to the nearest representable value in the target format
Microscaling (MX): A data format specification (e.g., OCP Microscaling) where blocks of elements share a common scale factor to allow efficient low-precision representation
NLL: Negative Log-Likelihood—a loss metric used to evaluate the quality of language model predictions (lower is better)
Pareto frontier: The set of optimal trade-offs where no improvement in one metric (e.g., model size) is possible without degrading another (e.g., loss)