← Back to Paper List

Energy-Weighted Flow Matching for Offline Reinforcement Learning

Shiyuan Zhang, Weitong Zhang, Q. Gu
Tsinghua University, University of California, Los Angeles
International Conference on Learning Representations (2025)
RL

📝 Paper Summary

Generative Modeling Offline Reinforcement Learning
Energy-Weighted Flow Matching (EFM) learns energy-guided distributions directly by weighting the flow matching loss with sample energy, eliminating the need for auxiliary guidance models or complex sampling.
Core Problem
Existing energy-guided generative models (diffusion/flow matching) require training auxiliary models to estimate intermediate guidance or involve expensive back-propagation through energy functions during sampling.
Why it matters:
  • Auxiliary models introduce approximation errors and increase training complexity
  • Gradient-based guidance during sampling (e.g., in classifier guidance) increases computational cost and inference time
  • Accurate energy guidance is critical for tasks like molecule design and offline RL, where generating high-reward samples is the goal
Concrete Example: In offline RL methods like QGPO, guiding a policy toward high-return regions requires learning an intermediate time-dependent energy function via contrastive learning and calculating its gradient during every sampling step. EFM eliminates this by simply weighting the training loss using the Q-values of the trajectories.
Key Novelty
Energy-Weighted Flow Matching (EFM) & Q-weighted Iterative Policy Optimization (QIPO)
  • Proposes a new flow matching objective where the regression loss is weighted by the energy density (or Q-value) of the target data points
  • Theoretically proves that this weighted objective allows the learned velocity field to exactly match the energy-guided distribution without auxiliary time-dependent energy estimators
Breakthrough Assessment
8/10
Offers a theoretically grounded, simplified approach to guided generation that removes the need for auxiliary models, a significant bottleneck in prior energy-guided diffusion/flow work.
×