← Back to Paper List

CoT-Drive: Efficient Motion Forecasting for Autonomous Driving With LLMs and Chain-of-Thought Prompting

Haicheng Liao, Hanlin Kong, Bonan Wang, Chengyue Wang, Kanye Ye Wang, Zhengbing He, Chengzhong Xu, Zhenning Li
State Key Laboratory of Internet of Things for Smart City
IEEE Transactions on Artificial Intelligence (2025)
MM Reasoning Benchmark

📝 Paper Summary

Motion Forecasting Autonomous Driving Knowledge Distillation
CoT-Drive distills the reasoning capabilities of GPT-4 into lightweight edge models via Chain-of-Thought prompting to enable real-time, context-aware motion forecasting on resource-constrained devices.
Core Problem
Deep learning models for motion forecasting often fail in corner cases due to poor contextual understanding, while powerful LLMs like GPT-4 are too slow or costly for real-time deployment in vehicles.
Why it matters:
  • Online LLMs (e.g., GPT-4) suffer from latency, network instability, and privacy risks, making them unsafe for real-time autonomous driving decisions
  • Offline LLMs (e.g., Llama-2) are computationally heavy for edge devices and often lack the reasoning flexibility to handle rare, complex traffic scenarios
  • Existing data-driven forecasting models struggle to generalize to unseen environments, compromising safety in heterogeneous traffic
Concrete Example: In a complex intersection with mixed agents (cyclists, pedestrians), a standard model might miss the subtle interaction cues predicting a cyclist's sudden turn. An online LLM could reason through this but might fail to return a prediction in time due to network lag, causing an accident.
Key Novelty
Teacher-Student Chain-of-Thought Distillation
  • Uses GPT-4 Turbo as a 'teacher' to generate rich, step-by-step semantic analysis (Chain-of-Thought) of traffic scenes, covering interaction analysis and risk assessment
  • Distills this reasoning capability into a lightweight 'student' language model (Edge LM) that runs locally, enabling it to mimic the teacher's deep understanding without the computational cost
Architecture
Architecture Figure Figure 3
The overall CoT-Drive framework, detailing the encoder-decoder structure and the teacher-student distillation process.
Evaluation Highlights
  • Constructed 'Highway-Text' dataset containing over 6,600 annotated scenarios from NGSIM and HighD benchmarks
  • Constructed 'Urban-Text' dataset containing over 5,400 annotated samples from MoCAD and ApolloScape benchmarks
  • Proposed a novel zero-shot CoT prompting strategy that breaks scene analysis into four steps: Background, Interaction Analysis, Risk Assessment, and Prediction
Breakthrough Assessment
7/10
Innovative application of LLM distillation for real-time motion forecasting. The creation of large-scale text-description datasets for traffic scenes is a significant contribution, though the core architectural fusion is an incremental improvement on encoder-decoder schemes.
×