← Back to Paper List

PlanAgent: A Multi-modal Large Language Agent for Closed-loop Vehicle Motion Planning

Yupeng Zheng, Zebin Xing, Qichao Zhang, Bu Jin, Pengfei Li, Yuhang Zheng, Zhongpu Xia, Kun Zhan, Xianpeng Lang, Yaran Chen, Dongbin Zhao
State Key Laboratory of Multimodal Artificial Intelligence Systems, School of Mechanical Engineering and Automation, Department of Computer Science and Technology
arXiv.org (2024)
MM Agent Reasoning Benchmark

📝 Paper Summary

Autonomous driving motion planning Multi-modal Large Language Models (MLLMs)
PlanAgent is a multi-modal agent that utilizes an MLLM with hierarchical reasoning and simulation-based reflection to generate robust vehicle motion planners for closed-loop autonomous driving.
Core Problem
Existing rule-based planners struggle with long-tailed scenarios, while learning-based methods often overfit and perform poorly in large-scale closed-loop settings due to lack of interpretability and common sense.
Why it matters:
  • Rule-based methods (like PDM) handle common driving well but fail in complex, rare situations (long-tail) requiring nuanced maneuvering
  • Current learning-based planners frequently fail in closed-loop evaluation despite open-loop success, suffering from cumulative errors
  • Previous LLM-based attempts rely on open-loop metrics or inefficient text representations of maps, limiting real-world applicability
Concrete Example: In a long-tailed scenario requiring complex maneuvers, rule-based methods may be too conservative or rigid, while PlanAgent can reason about the 'precautions in this type of scene' (e.g., merging at roundabouts) to adjust planner parameters dynamically.
Key Novelty
Closed-Loop Mid-to-Mid MLLM Planning Agent
  • Transforms environmental data into a hybrid prompt: a visual BEV map for global context and a lane-graph-based textual description for precise local topology, efficient for MLLMs
  • Uses a Reasoning Engine with a hierarchical chain-of-thought (CoT) to bridge high-level scene understanding with low-level Python code generation for an IDM planner
  • Integrates a Reflection module that validates generated planners via short-term simulation, filtering out unsafe proposals before execution
Architecture
Architecture Figure Fig.2
The overall pipeline of PlanAgent, illustrating the flow from Environment Transformation to Reasoning Engine and Reflection.
Evaluation Highlights
  • Outperforms state-of-the-art methods (PDM-Closed, PlanTF) on the nuPlan Val14 and Test14-hard benchmarks in closed-loop settings
  • Achieves superior scores in reactive and non-reactive tests compared to both rule-based and learning-based baselines
  • Requires only one-third of the token count for textual description compared to existing LLM-based SOTA methods due to efficient lane-graph representation
Breakthrough Assessment
8/10
Significantly advances LLM applications in autonomous driving by moving from open-loop trajectory prediction to closed-loop planner code generation with verification, addressing the critical safety/stability gap.
×