← Back to Paper List

Quantization Error Propagation: Revisiting Layer-Wise Post-Training Quantization

Yamato Arai, Yuma Ichikawa
Department of Basic Science, The University of Tokyo
arXiv.org (2025)
Pretraining

📝 Paper Summary

Model Compression Post-Training Quantization (PTQ)
QEP improves layer-wise post-training quantization by explicitly propagating quantization errors from previous layers and optimizing current layer weights to compensate for them, rather than treating each layer independently.
Core Problem
Existing layer-wise PTQ methods treat each layer's quantization as an independent optimization problem, ignoring how quantization errors accumulate and grow across layers.
Why it matters:
  • Accumulated errors significantly degrade model performance, especially in low-bit regimes (e.g., 2-bit) where precision is scarce.
  • The standard approach saturates in performance because it optimizes local reconstruction without accounting for the global drift caused by upstream quantization.
Concrete Example: In a Llama-2-7B model where only the first 10 blocks are quantized, the error (distance between full-precision and quantized features) grows exponentially across blocks because the 10th block optimizes its weights assuming perfect inputs, rather than correcting the noisy inputs it actually receives from the 9th block.
Key Novelty
Quantization Error Propagation (QEP)
  • Reformulates the layer-wise objective to minimize the error between the original output (from clean inputs) and the quantized output (from noisy, quantized inputs), rather than just matching local behavior.
  • Derives a closed-form weight correction term that adjusts the current layer's weights to counteract the specific noise pattern introduced by previous layers.
  • Introduces a tunable scalar parameter alpha that controls the strength of this correction to prevent overfitting to the calibration data, particularly in parameter-heavy MLP blocks.
Evaluation Highlights
  • Achieves substantially higher accuracy than GPTQ, AWQ, and QuIP across various LLMs.
  • Improvements are most pronounced in the extremely low-bit regime (e.g., 2-bit quantization), where standard methods degrade significantly.
  • Maintains computational complexity comparable to existing layer-wise PTQ methods while offering a scalable framework.
Breakthrough Assessment
7/10
Addresses a fundamental theoretical oversight in widely used PTQ methods (independence assumption). The closed-form correction is elegant and low-cost, though the primary value is in low-bit regimes where current methods fail.
×