_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
dLLM: Diffusion Large Language Model—a generative model that creates text by iteratively denoising a sequence (often starting from masks or noise) rather than predicting token-by-token
PTQ: Post-Training Quantization—compressing a model's weights and activations to lower precision (e.g., 4-bit integers) after training is complete, without extensive retraining
GPTQ: Generative Pre-trained Transformer Quantization—a weight-only quantization method that minimizes layer-wise reconstruction error using second-order Hessian information
AWQ: Activation-aware Weight Quantization—a method that protects important weights based on activation magnitudes to preserve performance
SmoothQuant: A weight-activation quantization technique that mathematically smooths activation outliers by migrating the quantization difficulty from activations to weights
QuaRot: Quantization with Rotation—a method that applies randomized Hadamard rotations to weights and activations to eliminate outliers and make the data distribution more quantization-friendly
DuQuant: A rotation-based quantization method that uses outlier-aware rotations and channel permutations to better handle massive outliers
Massive Outliers: Activation values that are significantly larger than the rest of the distribution, often appearing in specific channels or tokens, which skew quantization ranges
W4A4: 4-bit Weights and 4-bit Activations quantization setting
W8A8: 8-bit Weights and 8-bit Activations quantization setting