← Back to Paper List

Gradient Flow Drifting: Generative Modeling via Wasserstein Gradient Flows of KDE-Approximated Divergences

Jiarui Cao, Zixuan Wei, Yuxin Liu
The Chinese University of Hong Kong, Civil Aviation University of China
arXiv (2026)
MM

📝 Paper Summary

Generative Modeling Gradient Flows Kernel Methods
Gradient Flow Drifting proves that the empirical Drifting Model is mathematically equivalent to the Wasserstein gradient flow of the forward KL divergence on KDE-smoothed densities, enabling a unified framework for varying divergences.
Core Problem
The recently proposed Drifting Model achieves state-of-the-art generation but lacks a solid theoretical foundation, relying on heuristic analysis and requiring complex assumptions for identifiability proofs.
Why it matters:
  • Current theoretical gaps make it difficult to understand why Drifting Models converge or to systematically improve them.
  • Existing proofs for model identifiability (knowing when the model has learned the true distribution) require strong, often unrealistic smoothness assumptions.
  • A lack of unification prevents researchers from combining the strengths of different divergences (like MMD for mode coverage vs. KL for precision) in a principled way.
Concrete Example: In the original Drifting Model, the drifting field is derived heuristically. Without the gradient flow connection, it is unclear how to modify the loss function to explicitly prevent mode collapse (missing data modes) or mode blurring (fuzzy images), which are characteristic failures of pure KL-based minimization.
Key Novelty
Gradient Flow Drifting
  • Identifies that the 'drifting field' in Drifting Models is exactly the particle velocity field of the Wasserstein-2 gradient flow for the KL divergence of KDE-smoothed densities.
  • Generalizes the framework to allow any f-divergence (e.g., Reverse KL, Chi-squared) or MMD, where the drift velocity is always proportional to the difference of KDE log-density gradients.
  • Proves that mixing velocity fields from different divergences (e.g., Reverse KL + Chi-squared) creates a valid combined gradient flow that balances mode-seeking and mode-covering behaviors.
Architecture
Architecture Figure Algorithm 1 (Conceptual)
The training procedure for Gradient Flow Drifting
Breakthrough Assessment
8/10
Provides a rigorous mathematical foundation for a high-performing empirical method. The unification of MMD, Drifting Models, and f-divergences into a single kernel-based gradient flow framework is a significant theoretical advance.
×