| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Medical Report Generation results showing Hulu-Med's superiority over specialized baselines even at smaller parameter counts. | ||||
| MIMIC-CXR | RaTEScore | 51.3 | 57.0 | +5.7 |
| General Multimodal Understanding results comparing against proprietary SOTA. | ||||
| VQA-RAD | Accuracy | 76.6 | 82.7 | +6.1 |
| MMedBench | Accuracy | 74.27 | 75.13 | +0.86 |
| Video understanding results on surgical datasets. | ||||
| SurgeryVideoQA | Accuracy | 29.9 | 30.1 | +0.2 |