Mamba: A selective structured state space model (S6) that models sequences with linear complexity while maintaining a global receptive field
Siamese network: An architecture with two identical subnetworks (branches) that share weights, used here to process two different modalities (RGB and X)
SS2D: Selective Scan 2D—a module that scans 2D feature maps in four directions (corners to opposite corners) to model spatial dependencies using 1D SSMs
VSSB: Visual State Space Block—the basic building block of the encoder, containing SS2D modules for spatial modeling
CroMB: Cross Mamba Block—a fusion module where selective scan parameters (matrices B, C, Delta) are generated from one modality to modulate the other, enabling cross-modal interaction
ConMB: Concat Mamba Block—a fusion module that concatenates features from two modalities and scans them jointly (and inversely) to integrate information
CAVSSB: Channel-Aware Visual State Space Block—a decoder block that adds channel attention (pooling) to the standard VSSB to enhance channel-specific feature selection
RGB-T: RGB-Thermal imaging
RGB-D: RGB-Depth imaging
mIoU: Mean Intersection over Union—the standard metric for semantic segmentation accuracy