| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results on Arithmetic Reasoning tasks showing consistent improvements over Manual-CoT and Self-Consistency (SC). | ||||
| Arithmetic Reasoning (Average of 5 tasks) | Accuracy | 61.3 | 64.0 | +2.7 |
| Arithmetic Reasoning (Average of 5 tasks) | Accuracy | 67.0 | 70.3 | +3.3 |
| GSM8K | Accuracy | 68.2 | 73.0 | +4.8 |
| Results on other reasoning types (Commonsense and Symbolic) and non-reasoning tasks. | ||||
| Commonsense Reasoning (Average) | Accuracy | 69.0 | 72.4 | +3.4 |
| Letter (4) | Accuracy | 60.6 | 63.8 | +3.2 |
| e-SNLI | Accuracy | 74.8 | 78.2 | +3.4 |