PTQ: Post-Training Quantization—compressing a model after training using limited data, without full fine-tuning
ModCap: Module Capacity—a metric defined by the paper to quantify a layer's ability to represent information, based on parameters, bit-width, and stride
Oscillation: The phenomenon where reconstruction loss fluctuates (increases and decreases) significantly across layers/blocks during sequential quantization, rather than decreasing monotonically
Topological Homogeneity: A condition defined by the authors where two modules share hyperparameters (like stride/groups) except kernel size and channels, allowing their capacities to be compared
Mixed Reconstruction Granularity: The strategy of varying the size of the unit being optimized (e.g., merging two blocks) based on capacity differences
BRECQ: Block Reconstruction Quantization—a baseline PTQ method that optimizes reconstruction error within local blocks
QDROP: A PTQ method that randomly drops quantization during reconstruction to flatten the loss landscape
diminishing marginal utility: The economic concept applied here to batch size: adding more calibration data helps, but the benefit decreases as the batch size gets larger