SAM: Segment Anything Model—a visual foundation model trained on 11 million images, known for zero-shot generalization
SAM-AD: A version of SAM pre-trained by the authors on autonomous driving datasets (KITTI, nuScenes) using masked auto-encoding
OOD: Out-of-Distribution—scenarios significantly different from the training data, such as severe weather for a model trained on sunny days
mAP: Mean Average Precision—a key metric for object detection accuracy
NDS: NuScenes Detection Score—a composite metric for 3D detection accuracy including translation, scale, orientation, and attribute errors
DWT: Discrete Wavelet Transform—a mathematical tool that decomposes a signal into low-frequency (coarse) and high-frequency (detail/noise) components
FPN: Feature Pyramid Network—a structure that generates multi-scale feature maps from a single input resolution
ViT: Vision Transformer—a transformer-based architecture for computer vision tasks
KITTI-C: A corrupted version of the KITTI dataset with synthetically added weather and sensor noise for robustness testing
nuScenes-C: A corrupted version of the nuScenes dataset with synthetically added weather and sensor noise