← Back to Paper List

AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer

Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong, Di Huang, Yunhong Wang
European Conference on Computer Vision (2024)
MM

📝 Paper Summary

Post-Training Quantization (PTQ) Vision Transformers (ViT) Model Compression
AdaLog improves low-bit quantization of Vision Transformers by using an adaptive logarithmic base for power-law activations and a fast progressive search to optimize parameters.
Core Problem
Existing PTQ methods for ViT activations use fixed logarithmic bases (like base-2) that cannot adapt to varying power-law distributions across layers, leading to high error at low bit-widths.
Why it matters:
  • Vision Transformers are computationally expensive and slow on edge devices, necessitating compression.
  • Current fixed-base log quantizers suffer from either large rounding errors for large values or truncation errors for small values when bit-width is low (e.g., 4-bit).
  • Standard grid search for quantization parameters is either too coarse (missing optima) or too slow (brute force).
Concrete Example: The Log2 quantizer incurs substantial rounding errors for large activations under 4-bits, while the Log-Sqrt(2) quantizer suffers from truncation errors for small activations under 3-bits. Additionally, Log-Sqrt(2) requires floating-point multiplication during inference, which is not hardware-friendly.
Key Novelty
Adaptive Logarithm (AdaLog) Quantizer with Fast Progressive Combining Search (FPCS)
  • Proposes a non-uniform quantizer that optimizes the logarithmic base per layer instead of using a fixed base (like 2), better fitting the power-law distribution of post-Softmax/GELU activations.
  • Implements a hardware-friendly de-quantization mechanism using look-up tables and integer-only arithmetic, avoiding floating-point operations despite the arbitrary base.
  • Introduces a search strategy (FPCS) that progressively refines the hyperparameter search space (coarse-to-fine) to find optimal quantization parameters efficiently.
Architecture
Architecture Figure Figure 3
Comparison of de-quantization flows for standard Log2, Log-Sqrt(2), and the proposed AdaLog.
Evaluation Highlights
  • Significantly outperforms state-of-the-art PTQ methods on ImageNet classification, COCO detection, and segmentation tasks.
  • Achieves higher accuracy at low bit-widths (e.g., 4-bit and 3-bit) compared to fixed-base log quantizers.
  • FPCS strategy locates optimal hyperparameters more precisely with linear complexity, unlike brute-force (quadratic) or alternating search (local optima).
Breakthrough Assessment
7/10
Strong practical improvement for low-bit ViT quantization. addressing specific distribution mismatches in prior work. The hardware-friendly implementation of arbitrary bases is a clever engineering contribution.
×