← Back to Paper List

Post-training Quantization with Progressive Calibration and Activation Relaxing for Text-to-Image Diffusion Models

Siao Tang, Xin Wang, Hong Chen, Chaoyu Guan, Zewen Wu, Yansong Tang, Wenwu Zhu
Tsinghua University
European Conference on Computer Vision (2023)
MM Benchmark

📝 Paper Summary

Model Compression Generative AI
PCR improves the compression of text-to-image models by accounting for error accumulation during the multi-step generation process and selectively keeping sensitive steps at higher precision.
Core Problem
Existing quantization methods for diffusion models ignore how errors accumulate across multiple denoising steps and fail to account for the specific sensitivity of text-to-image models to different timesteps.
Why it matters:
  • Large diffusion models like Stable Diffusion XL are computationally expensive, making deployment on consumer hardware difficult without compression
  • Current evaluation metrics (standard FID on COCO) are inaccurate for large-scale text-to-image models due to data distribution gaps, potentially blocking progress in the field
  • Previous quantization approaches result in significant degradation of image fidelity or text-image alignment because they treat all timesteps equally
Concrete Example: When quantizing Stable Diffusion XL to 8-bit using previous methods like Q-diffusion, the model loses the ability to match textual semantics (e.g., generating a generic scene instead of the specific prompt description), whereas the proposed method maintains alignment.
Key Novelty
Progressive Calibration and Activation Relaxing (PCR)
  • Progressive Calibration: Instead of calibrating with a full-precision model, it quantizes time step t using data generated where all previous steps (t+1 to T) were already quantized, effectively 'training' the quantization to handle accumulated errors
  • Activation Relaxing: Identifies that models have specific 'sensitive' steps (early steps for fidelity, later steps for text alignment) and keeps those few steps at higher precision (e.g., 10-bit) while quantizing the rest heavily
Architecture
Architecture Figure Figure 1
Overview of the PCR method, illustrating the Progressive Calibration (step-by-step quantization awareness) and Activation Relaxing (mixed precision for sensitive steps).
Evaluation Highlights
  • First successful quantization of Stable Diffusion XL (3.5B parameters) while maintaining performance, achieving 6.84 FID on QDiffBench compared to 6.78 for the full-precision model
  • Outperforms the state-of-the-art Q-diffusion method on Stable Diffusion, achieving 8.64 FID (vs. Q-diffusion's 10.96) under W8A8 quantization settings
  • The proposed activation relaxing strategy improves CLIP Score on Stable Diffusion XL from 0.310 (W8A8) to 0.319 (PCR), matching the full-precision model's 0.319
Breakthrough Assessment
8/10
Strong contribution by being the first to effectively quantize SDXL and identifying critical flaws in previous evaluation benchmarks. The progressive calibration idea is theoretically grounded and practically effective.
×