| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Comparison of SPHINX-Plus (13B) against the original SPHINX (13B) shows improvements in reasoning-heavy benchmarks. | ||||
| MMBench (MMB) | Score | 67.1 | 71.0 | +3.9 |
| MM-Vet | Score | 36.6 | 47.9 | +11.3 |
| MathVista | Score | 27.5 | 36.8 | +9.3 |
| Scaling analysis shows performance gains with larger models and MoE architectures. | ||||
| MME (Cognition) | Score | 283.6 | 367.1 | +83.5 |
| MMBench (MMB) | Score | 53.4 | 56.6 | +3.2 |