M2Det: Multi-Modal Datasets and Multi-Task Object Detection—a task definition where a single model detects objects across unconnected datasets of different modalities.
SAR: Synthetic Aperture Radar—an imaging technique using radar waves, effective at night or through clouds but visually distinct from optical images.
MoE: Mixture of Experts—a neural network architecture where different parts (experts) are activated for different inputs to increase capacity without increasing inference cost.
Sparse MoE: A variant of MoE where only a small subset (top-k) of experts is activated for any given input, keeping computation low.
HBB: Horizontal Bounding Box—standard axis-aligned detection box.
OBB: Oriented Bounding Box—rotated detection box, crucial for aerial objects like ships or vehicles that aren't axis-aligned.
DSO: Dynamic Submodule Optimization—the proposed method to adjust learning rates per module to balance convergence speeds and directions.
Grid-level Experts: MoE experts applied to individual spatial positions in the feature map, rather than routing the whole image to one expert.
EMA: Exponential Moving Average—a statistical method used here to smooth historical loss values for stability.
KL Divergence: A statistical distance measure used in DSO to compare current loss distributions with historical ones to detect optimization instability.