| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| ScienceQA Results: CoMD outperforms its teacher (LLaVA-13B) and the previous SOTA (MM-CoT Large). | ||||
| ScienceQA | Accuracy (%) | 90.36 | 91.83 | +1.47 |
| ScienceQA | Accuracy (%) | 91.68 | 91.83 | +0.15 |
| SEED-Bench Results: CoMD performs strongly for a 7B model but trails InstructBLIP. | ||||
| SEED-Bench | Accuracy (%) | 48.43 | 50.90 | +2.47 |
| SEED-Bench | Accuracy (%) | 58.76 | 50.90 | -7.86 |
| LLaVA Test Set Results: CoMD improves over the teacher in conversational and detailed description tasks. | ||||
| LLaVA Test Set | GPT-4 Score | 85.1 | 85.7 | +0.6 |
| ScienceQA | Accuracy (%) | 86.43 | 91.83 | +5.40 |