| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| ThinkLite-VL models significantly outperform their base models and other open-source/proprietary baselines on the challenging MathVista benchmark. | ||||
| MathVista | Accuracy | 70.2 | 75.1 | +4.9 |
| MathVista | Accuracy | 71.9 | 79.7 | +7.8 |
| ThinkLite-VL-7B shows consistent improvements across a wide range of general visual reasoning benchmarks. | ||||
| MathVerse | Score | 57.8 | 69.1 | +11.3 |
| ScienceQA | Accuracy | 95.5 | 95.5 | 0.0 |
| MMBench | Accuracy | 82.3 | 83.6 | +1.3 |
| Ablation studies confirm the effectiveness of MCTS-based selection over random selection. | ||||
| Average (8 benchmarks) | Average Score | 60.89 | 64.18 | +3.29 |