| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Overall performance on Alpaca dataset training shows IterIT surpassing baselines on MixEval and Average benchmarks. | ||||
| MixEval | Score | 37.52 | 38.15 | +0.63 |
| Average (7 tasks) | Average Score | 50.18 | 51.10 | +0.92 |
| MixEval | Score | 36.26 | 39.52 | +3.26 |
| Domain-specific generalization on CodeAlpaca shows IterIT improving coding performance. | ||||
| MBPP+ | pass@1 | 41.48 | 45.24 | +3.76 |