← Back to Paper List

VADv2: End-to-End Vectorized Autonomous Driving via Probabilistic Planning

Shaoyu Chen, Bo Jiang, Hao Gao, Bencheng Liao, Qing Xu, Qian Zhang, Chang Huang, Wenyu Liu, Xinggang Wang
Huazhong University of Science & Technology
arXiv.org (2024)
MM Agent Benchmark

πŸ“ Paper Summary

End-to-end Autonomous Driving Probabilistic Planning Vectorized Scene Representation
VADv2 replaces deterministic trajectory regression with probabilistic planning by modeling the action space as a distribution over a large vocabulary of feasible trajectories, selecting actions via sampling.
Core Problem
Deterministic planning models assume a fixed relationship between environment and action, failing to capture the multi-modal, non-convex nature of human driving behavior.
Why it matters:
  • Human driving is inherently stochastic; identical scenarios can yield valid but distinct maneuvers (e.g., yielding vs. overtaking), which deterministic regression averages into unsafe 'in-between' actions.
  • Deterministic models tend to collapse to the dominant mode (e.g., just going straight) seen in training data, ignoring rarer but necessary maneuvers.
  • Regression-based planning struggles with non-convex solution spaces, often outputting invalid trajectories that violate physical or safety constraints.
Concrete Example: When interacting with an oncoming vehicle, a driver might yield or overtake. A deterministic model might average these valid options and output a collision course. VADv2 models the distribution, allowing it to sample one valid mode (yield or overtake) rather than an invalid average.
Key Novelty
Probabilistic Planning with Vectorized Vocabulary
  • Discretizes the continuous planning space into a large 'vocabulary' of 4,096 physically feasible trajectories sampled from expert demonstrations.
  • Models planning as a probabilistic field: given environmental tokens, the network predicts a probability distribution over this entire trajectory vocabulary.
  • Selects actions by sampling from the predicted distribution, allowing the system to handle multi-modal scenarios and non-deterministic human behaviors.
Architecture
Architecture Figure Figure 2
The overall framework of VADv2, detailing the flow from multi-view images to probabilistic action sampling.
Evaluation Highlights
  • Achieves state-of-the-art closed-loop performance on the CARLA Town05 benchmark, significantly outperforming all existing methods.
  • Runs stably in a fully end-to-end manner using only camera sensors, even without rule-based wrappers.
  • Demonstrates ability to handle complex scenarios like lane changes and interactions with reduced collision rates compared to deterministic baselines.
Breakthrough Assessment
8/10
Significant shift from deterministic regression to probabilistic vocabulary-based planning in end-to-end driving. Solves the 'mode averaging' safety issue inherent in regression, achieving SOTA on CARLA.
×