| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Performance of GReaTer on Llama-3-8B compared to baselines across three benchmarks. | ||||
| GSM8K | Accuracy | 81.1 | 82.6 | +1.5 |
| BBH | Accuracy | 72.9 | 76.6 | +3.7 |
| FOLIO | Accuracy | 62.6 | 62.6 | +0.0 |
| Performance of GReaTer on Gemma-2-9B compared to baselines. | ||||
| GSM8K | Accuracy | 88.6 | 89.4 | +0.8 |
| BBH | Accuracy | 72.3 | 76.6 | +4.3 |