← Back to Paper List

On-device Language Models: A Comprehensive Review

Stanley H. Chan
School of Electrical and Computer Engineering, Purdue University
arXiv
MM Pretraining

📝 Paper Summary

Generative Models Computer Vision Image Synthesis
This tutorial unifies VAEs, DDPMs, score matching, and SDEs into a cohesive mathematical framework, explaining diffusion models as incremental iterative refinements rather than one-step generative processes.
Core Problem
Traditional generative models like VAEs struggle with 'one-step' generation, asking a single neural network to map a simple distribution (like Gaussian noise) to a complex data distribution (like images) in one go, which is difficult to learn and control.
Why it matters:
  • One-step generation places an immense burden on the decoder network to learn complex mappings instantly, limiting sample quality.
  • Generative tools have grown explosively, yet the mathematical connections between seemingly different approaches (VAE vs. Diffusion vs. Score Matching) remain fragmented for many new researchers.
  • Understanding the underlying 'incremental' nature of diffusion is critical for developing better sampling mechanisms and applications in text-to-image and text-to-video generation.
Concrete Example: In a VAE, a decoder must instantly transform a noise vector z ~ N(0,I) into a realistic image x. This is like trying to turn a ship 180 degrees in a single second. Diffusion models instead turn the ship incrementally, making small adjustments (denoising steps) that are easier to manage and learn.
Key Novelty
Unified Educational Framework for Diffusion
  • Frames Diffusion Models (DDPM) as a 'multi-step VAE' where generation is broken into a chain of small, incremental denoising updates rather than a single massive decoding step.
  • Demonstrates that minimizing the Evidence Lower Bound (ELBO) in this multi-step chain is mathematically equivalent to minimizing a weighted squared error between predicted and actual noise.
  • Connects discrete iterative algorithms (DDPM, SMLD) to continuous-time Stochastic Differential Equations (SDEs), showing they are discretizations of the same underlying physical processes (Langevin dynamics).
Breakthrough Assessment
9/10
While not presenting a new algorithm, this tutorial provides an exceptionally clear, mathematically grounded unification of VAEs, DDPMs, and SDEs, making complex topics accessible to researchers.
×