| Benchmark | Metric | Baseline | This Paper | Δ |
|---|---|---|---|---|
| ParityMNIST experiments establish the strong correlation between KL divergence and forgetting in a controlled setting. | ||||
| ParityMNIST (Toy Setting) | R² (Forgetting vs. KL) | Not applicable | 0.96 | Not applicable |
| LLM experiments demonstrate that KL predicts forgetting in large-scale settings as well. | ||||
| LLM Tasks (Combined) | R² (Forgetting vs. KL) | Not applicable | 0.71 | Not applicable |