LiDAR-PTQ: Post-Training Quantization for Point Cloud 3D Object Detection

📝 Paper Summary

3D Object Detection Model Compression Edge Computing

LiDAR-PTQ enables 8-bit quantization of 3D detectors with near-zero accuracy loss by replacing entropy calibration with sparsity-aware max-min initialization and refining parameters using task-specific pseudo-label supervision.

Core Problem

Directly applying 2D quantization methods (like entropy calibration) to 3D LiDAR models causes catastrophic performance drops due to the extreme sparsity and large coordinate ranges of point clouds.

Why it matters:

LiDAR-based 3D detectors are computationally expensive, making deployment on resource-constrained edge devices (e.g., autonomous vehicles, robots) difficult.
Existing 2D PTQ (Post-Training Quantization) methods assume dense activation distributions, failing to preserve geometric information in sparse 3D data.
Retraining-based methods (QAT) are computationally expensive (taking ~94 GPU hours) and require full labeled datasets, which may be restricted for privacy.

Concrete Example: When using standard entropy calibration on CenterPoint-Pillar, the model truncates activation values to remove 'outliers.' In point clouds, these 'outliers' represent distant points (50m+ range) carrying vital geometric info. This causes a massive accuracy drop from 60.32 mAPH to 21.65 mAPH.

Key Novelty

Sparsity-Aware Calibration & Task-Guided Supervision

Replaces entropy calibration with a 'Sparsity-based calibration' (Max-min with grid search) to ensure the quantization range covers the full dynamic range of sparse point cloud coordinates.
Introduces Task-guided Global Positive Loss (TGPL), which uses the full-precision model's top predictions as pseudo-labels to supervise the quantized model's output, aligning the final detection task rather than just layer-wise errors.
Adopts an adaptive rounding mechanism that learns a rounding value per weight to minimize local reconstruction error.

Architecture

Illustration of point cloud sparsity and the impact on feature maps.

Evaluation Highlights

Achieves 60.12 mAPH (Level 2) on CenterPoint-Pillar (INT8) on Waymo, matching the FP32 baseline (60.32 mAPH) with negligible loss (-0.20).
Outperforms state-of-the-art 2D PTQ methods like BRECQ and QDrop by large margins (+3.87 and +2.00 mAPH respectively on CenterPoint-Pillar).
Delivers 3x inference speedup on NVIDIA Jetson AGX Orin compared to FP32, while being 30x faster to calibrate than Quantization-Aware Training (QAT).

Breakthrough Assessment

9/10

Successfully solves the 'collapse' problem of PTQ in 3D LiDAR tasks where standard methods fail completely. Achieves parity with FP32 models and QAT speed/accuracy trade-offs are excellent.

⚙️ Technical Details

Problem Definition

Setting: Post-Training Quantization (PTQ) of 3D object detection models without access to ground truth labels.

Inputs: Pre-trained FP32 model weights W, calibration dataset D_c (unlabeled point clouds).

Outputs: Quantized model (INT8 weights and activations) with parameters scale (s), zero-point (z), and rounding values.

Pipeline Flow

Input Point Cloud
Voxelization/Pillarization
Quantized Backbone (Sparse/Dense Conv)
Quantized Neck/Head
Detection Output

System Modules

Quantizer Initialization

Determine initial scale (s) and zero-point (z) for weights and activations

Model or implementation: Max-min calibrator + Grid Search

TGPL Optimizer (Optimization)

Fine-tune activation quantization parameters using task supervision

Model or implementation: Gradient descent on quantization params

Adaptive Rounding (Optimization)

Optimize weight rounding to minimize layer-wise error

Model or implementation: Learnable parameter theta (0 to 1)

Novel Architectural Elements

Incorporation of a rounding variable theta into the quantization function: x_int = clamp(floor(x/s + theta) + z, ...)

Modeling

Base Model: CenterPoint (versions: CenterPoint-Pillar and CenterPoint-Voxel)

Training Method: Gradient-based optimization of quantization parameters (scale, zero-point, rounding)

Objective Functions:

Purpose: Minimize difference between quantized predictions and FP32 teacher predictions.

Formally: L_TGPL = L_cls + alpha * L_reg (using Focal and L1 loss).
Purpose: Minimize layer-wise feature reconstruction error.

Formally: L_local = || W*I - W_hat*I ||_F^2.

Adaptation: Optimization of quantization parameters only (weights are frozen except for rounding decision)

Training Data:

256 frames randomly sampled from Waymo training set (0.16% of data)

Key Hyperparameters:

activation_lr: 5e-5
weight_rounding_lr: 5e-3
TGPL_gamma_threshold: 0.1
+ 2 more
TGPL_top_K: 500
calibration_samples: 256

Compute: 3 GPU/hour for calibration (vs 94 for QAT) on NVIDIA Jetson AGX Orin for inference testing.

Comparison to Prior Work

vs. BRECQ/QDROP: LiDAR-PTQ uses max-min instead of entropy and adds task-specific loss (TGPL) to handle point cloud sparsity.
vs. PD-Quant: PD-Quant relies on BN statistics which fail on sparse LiDAR data; LiDAR-PTQ uses direct output alignment.
vs. Stacker et al. (2021) [not cited in paper]: Stacker et al. simply identified the entropy failure; LiDAR-PTQ proposes a solution (TGPL + Adaptive Rounding) to fix it.

Limitations

Requires a calibration process that takes ~3 GPU hours (slower than simple PTQ methods like Entropy, though faster than QAT).
Experiments focused on CenterPoint, FSD, and SPVNAS; generalization to transformer-based 3D detectors not explicitly tested.
The TGPL loss introduces hyperparameters (gamma, K) that may need tuning for new datasets.

Reproducibility

Code: https://github.com/StiphyJay/LiDAR-PTQ

Code is publicly available at https://github.com/StiphyJay/LiDAR-PTQ. Dataset (Waymo Open Dataset) is public. Calibration uses only 256 unlabeled frames.

📊 Experiments & Results

Evaluation Setup

3D Object Detection on large-scale autonomous driving datasets.

Benchmarks:

Waymo Open Dataset (WOD) (3D Object Detection)
nuScenes (3D Object Detection)
SemanticKITTI (Point Cloud Segmentation)

Metrics:

mAPH (Mean Average Precision weighted by Heading) Level 2
mAP (Mean Average Precision)
NDS (NuScenes Detection Score)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Quantization performance on Waymo Open Dataset (WOD) showing LiDAR-PTQ maintains near-FP32 performance while baselines collapse.
Waymo Open Dataset (WOD)	mAPH (Level 2)	60.32	60.12	-0.20
Waymo Open Dataset (WOD)	mAPH (Level 2)	56.27	60.12	+3.85
Waymo Open Dataset (WOD)	mAPH (Level 2)	21.65	60.12	+38.47
Waymo Open Dataset (WOD)	mAPH (Level 2)	65.25	65.18	-0.07
Results on Fully Sparse Detectors (FSD) confirming generalization.
Waymo Open Dataset (WOD)	mAPH (Level 2)	9.44	70.73	+61.29
Inference speedup analysis on Edge Hardware.
NVIDIA Jetson AGX Orin	FPS	38.4	113.4	+75.0

Experiment Figures

Trade-off between Inference Speed (FPS) and Accuracy (mAPH) for CenterPoint-Voxel and CenterPoint-Pillar.

Comparison of activation distributions between RGB images and LiDAR point clouds.

Main Takeaways

Standard Entropy calibration is catastrophic for LiDAR models because it truncates large-value coordinates (distant points) that are essential for 3D geometry.
Max-min calibration combined with grid search is strictly superior to entropy calibration for sparse point clouds.
LiDAR-PTQ achieves FP32-level accuracy with 8-bit quantization across Pillar-based, Voxel-based, and Fully Sparse detectors.
The method is cost-effective, offering 30x faster calibration than QAT while achieving comparable accuracy.

📚 Prerequisite Knowledge

Prerequisites

Quantization fundamentals (scale, zero-point, clipping error, rounding error)
LiDAR 3D Object Detection (Voxelization, BEV features)
Standard Loss Functions (Focal Loss, L1 Loss)

Key Terms

PTQ: Post-Training Quantization—compressing a model after training using a small unlabeled calibration set, without full retraining.

QAT: Quantization-Aware Training—retraining a model while simulating quantization effects, requiring full labeled datasets and high compute.

mAPH: Mean Average Precision weighted by Heading—a primary metric for 3D detection that penalizes boxes with incorrect orientation.

BEV: Bird's Eye View—a top-down 2D representation of 3D point cloud data.

Entropy Calibration: A method that selects quantization ranges by minimizing KL divergence between original and quantized distributions; often cuts off 'outliers'.

SPConv: Sparse Convolution—convolutional operations optimized for sparse data inputs like point clouds.

L2 (Level 2): A difficulty metric in the Waymo dataset referring to objects with at least one LiDAR point (including hard cases with few points).