| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| Main results demonstrating that distilled student models memorize significantly less than baselines trained from scratch across multiple datasets. | ||||
| FineWeb (Pythia) | Memorization Rate | 0.17 | 0.07 | -0.10 |
| Wikitext (Pythia) | Memorization Rate | 3.37 | 1.58 | -1.79 |
| Nemotron-CC-v2 (Pythia) | Memorization Rate | 0.0091 | 0.0012 | -0.0079 |
| Classification results showing high predictability of memorized examples using pre-training features. | ||||
| FineWeb | AUC-ROC | 0.50 | 0.9997 | +0.4997 |
| FineWeb | Recall | Not reported in the paper | 1.0000 | Not reported in the paper |