← Back to Paper List

From Flow to One Step: Real-Time Multi-Modal Trajectory Policies via Implicit Maximum Likelihood Estimation-based Distribution Distillation

Ju Dong, Liding Zhang, Lei Zhang, Yu Fu, Kaixin Bai, Zoltan-Csaba Marton, Zhenshan Bing, Zhaopeng Chen, Alois Christian Knoll, Jianwei Zhang
arXiv (2026)
MM RL

📝 Paper Summary

Robotic Manipulation Generative Policy Learning
This paper accelerates high-fidelity robotic manipulation policies by distilling a slow multi-step flow-matching teacher into a fast single-step student using a set-level Implicit Maximum Likelihood Estimation objective.
Core Problem
Generative policies like diffusion and flow matching capture multi-modal behaviors well but are too slow for high-frequency control due to iterative solving. Fast one-step alternatives often suffer from mode collapse, averaging diverse possibilities into invalid actions.
Why it matters:
  • Real-world robots require high-frequency control (>100Hz) to react to dynamic disturbances, while current generative policies often run at only 2-10 Hz
  • Standard distillation methods (like KL divergence or MSE) average out distinct modes, causing robots to fail tasks where multiple distinct valid trajectories exist (e.g., grasping an object from the left vs. right)
Concrete Example: In a task with two valid paths to an object (left or right), a standard behavior cloning or naive one-step student might output the average trajectory—going straight through an obstacle—resulting in collision.
Key Novelty
Set-Level IMLE Distillation for Flow Matching
  • Treats the slow teacher as an offline oracle that generates sets of valid future trajectories for a given observation
  • Trains a student to generate a corresponding set of hypotheses in one step, optimized via a bi-directional Chamfer distance
  • This set-based loss forces the student to cover all modes (diversity) and hit only valid modes (fidelity) without averaging them
Evaluation Highlights
  • Achieves 123.5 Hz inference speed on RLBench, a 14.3x speedup over the 50-step teacher (8.6 Hz)
  • Attains 68.6% success rate on RLBench with single-step inference, vastly outperforming Consistency Policy (16.3%) and Diffusion Policy 1-step (1.8%)
  • Real-world deployment achieves 70.0% success at 125 Hz, enabling dynamic re-planning where the 2.9 Hz teacher fails
Breakthrough Assessment
8/10
Significantly bridges the gap between the high performance of generative policies and the speed requirements of real-time control, solving the mode collapse issue in one-step distillation effectively.
×