| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Ablation study on ScienceQA using FLAN-Alpaca-Base demonstrates that one-stage CoT degrades performance, while the proposed Multimodal-CoT (two-stage + vision features) significantly improves it. | ||||
| ScienceQA | Accuracy | 81.63 | 85.31 | +3.68 |
| ScienceQA | Accuracy | 69.32 | 85.31 | +15.99 |
| ScienceQA | Accuracy | 79.37 | 85.31 | +5.94 |
| ScienceQA | RougeL | 90.73 | 93.46 | +2.73 |