| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Textual reasoning results demonstrate that NoWait significantly reduces token usage while often improving accuracy on math benchmarks. | ||||
| AMC 2023 | Accuracy | Not reported in the paper | Not reported in the paper | +4.25 |
| AMC 2023 | Generation Length | 100 | 70 | -30 |
| AMC 2023 | Accuracy | Not reported in the paper | Not reported in the paper | +6.00 |
| AIME 2025 | Accuracy | Not reported in the paper | Not reported in the paper | +1.33 |
| Multimodal results show substantial efficiency gains with minor accuracy trade-offs. | ||||
| Image Reasoning Average (MMMU, MathVista, etc.) | Generation Length | 2000 | 1020 | -980 |
| Image Reasoning Average | Accuracy | Not reported in the paper | Not reported in the paper | -3.42 |