← Back to Paper List

PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection

Peiyao Wang, Weining Wang, Qi Li
arXiv.org (2025)
MM RL Factuality Benchmark

📝 Paper Summary

Text-to-Video Generation Physics-based Alignment Reward Modeling
PhysCorr improves the physical realism of generated videos by using a specialized dual-reward model to guide Direct Preference Optimization (DPO) towards physically plausible dynamics.
Core Problem
State-of-the-art text-to-video models frequently violate fundamental physical laws (e.g., fluid dynamics, rigid body interactions) despite high visual fidelity.
Why it matters:
  • Current reward models focus on aesthetics and text alignment, neglecting physical plausibility like gravity or collision response.
  • Human preference datasets prioritize visual appeal over physical accuracy, creating a misalignment between training objectives and real-world constraints.
  • Generative models for robotics and simulation require strict adherence to physics, which current purely data-driven diffusion models fail to guarantee.
Concrete Example: In a generated video of waves crashing against a cliff, the water may continue rising indefinitely instead of rebounding (violating fluid dynamics), or a knife cutting meat may leave no mark (violating material interaction principles).
Key Novelty
PhysCorr (Physics-Constrained Text-to-Video Generation)
  • Introduces PhysicsRM, a lightweight reward model that explicitly evaluates both subject consistency (geometry stability) and mechanical coherence (causal interactions) to score videos.
  • Proposes PhyDPO, a specialized Direct Preference Optimization method that re-weights training samples based on the magnitude of physical violations, prioritizing correction of severe errors.
Architecture
Architecture Figure Figure 2
The complete PhysCorr pipeline including the PhysicsRM reward model structure and the PhyDPO training loop.
Evaluation Highlights
  • Significantly improves physical realism metrics on VBench2 across multiple dimensions compared to base models like Wan2.1 and VideoCrafter2.
  • Achieves parameter efficiency by distilling physical reasoning capabilities from a 7B VLM into a 0.5B reward model (PhysicsRM) with 98% accuracy retention.
Breakthrough Assessment
8/10
Addresses a critical and under-explored gap in video generation (physics compliance) with a principled dual-reward approach. The distillation strategy for efficient reward modeling is highly practical.
×