| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Visual Grounding results demonstrate significant superiority over existing RS-VLMs, particularly on VRSBench-Ref. | ||||
| VRSBench-Ref | Accuracy @ 0.5 | 39.6 | 69.31 | +29.71 |
| DIOR-RSVG | Accuracy @ 0.5 | 48.04 | 72.47 | +24.43 |
| Change Captioning results show RSUniVLM rivals specialized task-specific models. | ||||
| LEVIR-MCI | CIDEr | 136.56 | 139.80 | +3.24 |
| Scene Classification results show strong performance on SIRI-WHU but mixed results on other datasets. | ||||
| SIRI-WHU | Accuracy | 62.66 | 68.13 | +5.47 |
| AID | Accuracy | 91.26 | 81.18 | -10.08 |
| Ablation study confirms the effectiveness of G-MoE over LoRA and standard MoE. | ||||
| Average VQA | Accuracy | 82.75 | 91.57 | +8.82 |
| Average VG | Accuracy | 64.56 | 70.90 | +6.34 |