FOV: Field of View—the observable area a sensor can see at any given moment.
mIoU: Mean Intersection over Union—a standard metric for segmentation accuracy.
Voxel: A volume element representing a value on a regular grid in 3D space.
GF-Phase: Geometry-based Feature Fusion Phase—fusing features based on explicit projection of 3D points onto 2D images.
SF-Phase: Semantic-based Feature Fusion Phase—fusing features based on learned semantic relationships (attention) rather than just geometric projection.
Asymmetric Augmentation: Applying different data augmentation strategies to different modalities (e.g., flipping point clouds but only color-jittering images) while maintaining alignment where necessary.
MHSA: Multi-Head Self-Attention—mechanism relating different positions of a single sequence to compute a representation of the sequence.
MHCA: Multi-Head Cross-Attention—mechanism where a query sequence attends to a different key/value sequence (e.g., points attending to semantic embeddings).
Lovasz-softmax: A loss function designed to directly optimize the Jaccard index (IoU) for semantic segmentation.
Semantic Embeddings: High-level feature vectors representing specific object categories (e.g., 'car', 'pedestrian') aggregated from raw sensor features.