← Back to Paper List

Post-Training Quantization for Video Matting

Tianrui Zhu, Houyuan Chen, Ruihao Gong, Michele Magno, Haotong Qin, Kai Zhang
Nanjing University, SenseTime Research
arXiv.org (2025)
MM

📝 Paper Summary

Video Matting Model Compression
PTQ4VM is a post-training quantization framework for video matting that combines block-wise optimization with global statistical calibration and optical-flow-guided temporal consistency to minimize accuracy loss.
Core Problem
Directly applying standard post-training quantization to video matting models causes severe accuracy degradation and temporal flickering due to cumulative statistical shifts (especially from BN layers) and fragile recurrent dynamics.
Why it matters:
  • Video matting is computationally intensive, making real-time deployment on edge devices difficult without compression
  • Existing PTQ methods often neglect the specific statistical distortions caused by Batch Normalization folding in deep networks
  • Recurrent architectures in video models are highly sensitive to quantization noise, leading to visible artifacts like jittering mattes
Concrete Example: When quantizing the RVM model to 4-bit, standard methods lead to a 10-20% error increase and flickering alpha mattes where hair strands or edges inconsistently disappear between frames, whereas the proposed method maintains near full-precision stability.
Key Novelty
PTQ4VM: Statistical Calibration + Optical Flow Guidance
  • Introduces Global Affine Calibration (GAC) to statistically compensate for distribution shifts caused by Batch Normalization folding and cumulative quantization errors across the network
  • Incorporates an Optical Flow Assistance (OFA) component that warps previous frame predictions to the current frame, using this as a temporal prior to guide the quantization process and reduce flickering
Evaluation Highlights
  • Reduces error of existing PTQ methods on video matting tasks by up to 20% compared to standard baselines
  • Achieves 4-bit quantization performance close to full-precision counterparts while delivering 8x FLOP savings
  • State-of-the-art accuracy across varying bit-widths compared to methods like AdaRound, BRECQ, and QDrop
Breakthrough Assessment
8/10
First systematic PTQ framework specifically for video matting. Effectively addresses both statistical drift from BN folding and temporal consistency, enabling usable 4-bit video matting.
×