| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Math and Algorithmic Reasoning: DC enables models to discover and reuse code-based strategies, leading to near-perfect scores on algorithmic tasks and massive gains on competition math. | ||||
| Game of 24 | Accuracy | 10 | 99 | +89 |
| AIME 2024 | Accuracy | 23 | 50 | +27 |
| AIME 2025 | Accuracy | Not explicitly reported in the paper | Not explicitly reported in the paper | +30 |
| Math Equation Balancer | Accuracy | 45 | 100 | +55 |
| Knowledge-Intensive Tasks: DC improves performance by recalling specific domain knowledge and formulas. | ||||
| GPQA-Diamond | Accuracy | Not explicitly reported in the paper | Not explicitly reported in the paper | +9 |
| MMLU-Pro (Eng/Physics) | Accuracy | Not explicitly reported in the paper | Not explicitly reported in the paper | +8 |