VLA: Vision-Language-Action—models that process visual and textual inputs to directly output physical actions or trajectories
RWM: Reward World Model—a learned neural network that predicts the quality (reward) of a trajectory without running a full physics simulation
IRL: Inverse Reinforcement Learning—learning a reward function from expert demonstrations rather than defining it manually
EPDMS: Ego-Pseudo Driving Metric System—a composite scoring metric for driving including collision, traffic rule compliance, and comfort
PPO: Proximal Policy Optimization—a stable policy gradient reinforcement learning algorithm used here to finetune the VLA
GAE: Generalized Advantage Estimation—a method to reduce variance in policy gradient estimates
BEV: Bird's Eye View—a top-down representation of the driving scene
Sim2Real: Simulation-to-Real gap—the difference between simulated environments and the real world, which often degrades model performance
DAC: Drivable Area Compliance—a metric checking if the vehicle stays within the road boundaries
TTC: Time to Collision—a safety metric measuring time before a potential impact