Evaluation Setup
Evaluation on subset of NuScenes dataset (real-world) and highway simulation. Tasks include object detection, scene understanding, and driving decision making.
Benchmarks:
- NuScenes Dataset (Autonomous Driving Perception and Decision Making)
Metrics:
- Perceptual Accuracy (correct identification of target species)
- Decision Accuracy (correct driving action selection)
- Mathematical Accuracy (correct calculation of vehicle distance)
- Statistical methodology: Not explicitly reported in the paper
Key Results
| Benchmark |
Metric |
Baseline |
This Paper |
Δ |
| NuScenes |
Average Perceptual Accuracy |
94 |
92 |
-2
|
| NuScenes |
Car Recognition Accuracy |
100 |
100 |
0
|
| NuScenes |
Mathematical Accuracy |
0 |
100 |
+100
|
| NuScenes |
Mathematical Accuracy |
100 |
100 |
0
|
Main Takeaways
- PKRD-CoT significantly improves decision-making accuracy (22% over zero-shot, 6% over role-playing), validating the importance of structured reasoning in AD tasks
- GPT-4.0 demonstrates the most robust performance across all dimensions, particularly in mathematical reasoning where smaller models like MiniGPT-4 fail completely
- Open-source models like Qwen-VL-Plus show competitive performance in perception and reasoning, though some struggle with specific targets like traffic lights
- The 'Knowledge' component allows models to infer actions from static signs (e.g., Red Light implies Stop) without explicit training, mimicking human driver logic