| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Zero-shot performance comparisons showing SCoT improving over standard CoT across diverse datasets. | ||||
| GSM8K | Accuracy | 52.11 | 73.16 | +21.05 |
| Tracking_Objects | Accuracy | 46.20 | 70.33 | +24.13 |
| CSQA | Accuracy | 43.98 | 59.21 | +15.23 |
| MathQA | Accuracy | 32.61 | 39.91 | +7.30 |
| ARC | Accuracy | 79.31 | 76.71 | -2.60 |
| Few-shot performance showing the benefit of matching demonstrations based on strategy. | ||||
| GSM8K | Accuracy | 52.11 | 74.75 | +22.64 |
| ARC | Accuracy | 79.31 | 83.65 | +4.34 |