| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Generalization results on MC4 (Multilingual) show DEPT variants significantly outperforming standard baselines in perplexity. | ||||
| MC4 (Validation) | Perplexity | 39.9 | 33.0 | -6.9 |
| Generalization results on The Pile (Multi-domain) show consistent perplexity improvements. | ||||
| The Pile (Validation) | Perplexity | 15.7 | 13.3 | -2.4 |
| Downstream task performance after continued pre-training demonstrates DEPT's transformer body quality. | ||||
| RACE | Accuracy | 32.6 | 34.5 | +1.9 |
| MNLI | Accuracy | 38.7 | 41.6 | +2.9 |
| Efficiency metrics highlighting communication reductions. | ||||
| Multilingual 1.3B Model | Communication Cost Reduction | 1.0 | 714.0 | 714x |