VLT: Vision-Language-Trajectory—a multimodal pre-training approach where the model learns to predict text descriptions and trajectories from visual inputs
GRPO: Group Relative Policy Optimization—an RL algorithm that updates a policy by comparing a group of outputs generated for the same input and reinforcing the best ones relative to the group average
RFS: Rater Feedback Score—a metric evaluating driving quality based on alignment with human-preferred trajectories
CoT: Chain-of-Thought—a prompting technique where the model generates intermediate reasoning steps before the final answer
ADE: Average Displacement Error—the average L2 distance between the predicted trajectory and the ground truth over a specific time horizon
nominal driving: Standard, everyday driving conditions (lane keeping, simple turns) as opposed to rare 'long-tail' events
SFT: Supervised Fine-Tuning—training the model on labeled data using standard cross-entropy loss
long-tail scenarios: Rare, edge-case driving situations (e.g., debris on road, erratic pedestrians) that are difficult to model