| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Cross-modal adaptation results showing performance of MM-SAM on single non-RGB modalities compared to Vanilla SAM. | ||||
| KITTI (LiDAR only) | mIoU | 39.1 | 67.4 | +28.3 |
| VT5000 (Thermal only) | mIoU | 64.1 | 78.4 | +14.3 |
| Multi-modal fusion results showing performance improvements when combining sensors (RGB + X) using the proposed Weakly-supervised Multi-Modal Fusion (WMMF). | ||||
| VT5000 (RGB + Thermal) | mIoU | 66.5 | 84.0 | +17.5 |