| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| AdaptThink significantly reduces token usage while improving accuracy across multiple math benchmarks using DeepSeek-R1-Distill-Qwen-1.5B. | ||||
| GSM8K | Average Response Length | Not explicitly reported in the paper | Not explicitly reported in the paper | -50.9% |
| GSM8K | Accuracy | Not explicitly reported in the paper | Not explicitly reported in the paper | +4.1% |
| MATH500 | Average Response Length | Not explicitly reported in the paper | Not explicitly reported in the paper | -63.5% |
| MATH500 | Accuracy | Not explicitly reported in the paper | Not explicitly reported in the paper | +1.4% |
| AIME2024 | Average Response Length | Not explicitly reported in the paper | Not explicitly reported in the paper | -44.7% |
| AIME2024 | Accuracy | Not explicitly reported in the paper | Not explicitly reported in the paper | +1.6% |