| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| The paper finds that pretraining stage does not significantly impact the immediate ability to acquire knowledge (Effectivity), but model size does. | ||||
| Fictional Knowledge | Effectivity | Qualitatively similar to Late Stage | Qualitatively similar to Late Stage | Insignificant difference |
| Fictional Knowledge | Effectivity | Lower magnitude | Higher magnitude (OLMo-7B) | Positive |
| Forgetting follows a power-law relationship, and larger batch sizes reduce the rate of forgetting. | ||||
| Fictional Knowledge | Retainability Trend | N/A | Power-law fit | N/A |
| Fictional Knowledge | Retainability | Faster forgetting rate | Slower forgetting rate (Batch Size 2048) | Positive retention |