closed-loop: Evaluation where the model's actions influence future states (like driving a car), as opposed to predicting actions on a pre-recorded dataset (open-loop)
Action Dreaming: A proposed method to generate synthetic training data by simulating alternative futures (e.g., unsafe maneuvers) for a static visual scene to force language-action alignment
CARLA: An open-source simulator for autonomous driving research
Chain-of-Thought: A reasoning process where the model generates intermediate reasoning steps (text) before producing the final output (action)
VQA: Visual Question Answering—answering natural language questions about an image
temporal waypoints: Future vehicle coordinates at specific time intervals (e.g., every 0.25s), capturing speed
geometric path waypoints: Future vehicle coordinates at specific distance intervals (e.g., every 1m), capturing spatial path independent of speed
PID controller: Proportional-Integral-Derivative controller—a control loop mechanism employing feedback to keep a system at a setpoint (used here to convert waypoints to steering/throttle)
InternVL2: A specific family of Vision-Language Models used as the backbone
LLM: Large Language Model
TransFuser: A baseline autonomous driving model that fuses camera and LiDAR data