BEV: Bird's Eye View—a top-down representation of the driving scene, commonly used to unify multi-view camera data
mIoU: mean Intersection over Union—a standard metric for evaluating segmentation accuracy by measuring the overlap between predicted and ground truth regions
VPQ: Video Panoptic Quality—a metric evaluating both the recognition and tracking quality of objects over time in a video sequence
Occupancy: A 3D grid representation where each voxel indicates whether space is occupied by an object or free
World Model: A generative model that learns to simulate the physics and dynamics of an environment, predicting future states based on current states and actions
Centripetal Flow: A vector field describing the motion of points towards the center of an instance (object), used here to model object dynamics
Ego-motion: The movement of the autonomous vehicle itself
Lovasz Loss: A loss function designed to directly optimize the Intersection over Union (IoU) metric
Affine Transformation: A geometric transformation that preserves lines and parallelism, used here in normalization to scale and shift features