← Back to Paper List

Solving Motion Planning Tasks with a Scalable Generative Model

Yi Hu, Siqi Chai, Zhening Yang, Jingyu Qian, Kun Li, Wenxin Shao, Haichao Zhang, Wei Xu, Qiang Liu
European Conference on Computer Vision (2024)
MM Agent RL Pretraining Benchmark

📝 Paper Summary

Autonomous Driving Simulation World Models
GUMP is a generative world model that uses a key-value tokenization strategy and partial autoregressive decoding to unify scene generation, simulation, and planning in autonomous driving.
Core Problem
Autonomous driving systems lack scalable, realistic simulators; existing learning-based models struggle with long-horizon consistency in closed-loop settings, while rule-based simulators fail to capture complex human-like interactivity.
Why it matters:
  • Scalability limits: AD systems struggle to adapt to unseen environments without extensive engineering for failure scenarios.
  • Simulation gap: Open-loop predictions cannot adapt to out-of-distribution states encountered during real-world interactions.
  • Safety validation: Developing safe policies requires high-fidelity, reactive environments that can generate diverse and rare traffic scenarios.
Concrete Example: In a complex intersection, an open-loop model might predict a car continues straight regardless of the ego-vehicle's actions. In a closed-loop scenario, if the ego-vehicle aggressively merges, a realistic simulator (like GUMP) should make the other car yield or swerve, rather than colliding blindly as an open-loop prediction would.
Key Novelty
Generative Unified Model for Motion Planning (GUMP)
  • Key-Value Tokenizer: Treats agents as 'keys' (ID + category) and their physical properties as 'values' (quantized states), enabling flexible querying and dynamic agent management.
  • Partial-Autoregressive Acceleration: Converts intra-frame dependencies to non-autoregressive (NAR) parallel decoding to speed up inference without losing inter-frame causal consistency.
  • Unified Downstream Support: A single foundation model acts as a scene generator, a reactive simulator for testing, a planner, and an RL training environment.
Architecture
Architecture Figure Figure 2
The overall architecture of GUMP, detailing the flow from static/dynamic inputs to the final trajectory decoding.
Evaluation Highlights
  • Achieves state-of-the-art performance on simulation realism and scene generation benchmarks (Waymo and nuPlan).
  • Planner module based on the world model outperforms prior arts in planning benchmarks.
  • Significantly improves inference and training speed via partial-Autoregressive mode while maintaining generative capability.
Breakthrough Assessment
8/10
Proposes a highly versatile architecture that effectively merges simulation, generation, and planning. The key-value tokenization and partial-AR speedup address critical bottlenecks in deploying transformers for real-time AD simulation.
×