| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| RAG-Driver achieves state-of-the-art results on BDD-X for explanation and justification tasks. | ||||
| BDD-X | BLEU-4 (Action) | 32.6 | 36.2 | +3.6 |
| BDD-X | CIDEr (Action) | 205.8 | 214.3 | +8.5 |
| BDD-X | BLEU-4 (Justification) | 27.5 | 30.3 | +2.8 |
| Zero-shot generalization results on the unseen Spoken-London dataset show significant improvements over baselines. | ||||
| Spoken-London | BLEU-4 (Action) | 18.4 | 25.1 | +6.7 |
| Spoken-London | CIDEr (Action) | 65.2 | 98.7 | +33.5 |