← Back to Paper List

A Quantitative Characterization of Forgetting in Post-Training

Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan
Department of Statistics, University of California, Davis, Amazon
arXiv (2026)
RL Pretraining Memory

📝 Paper Summary

Continual Learning Generative Model Post-Training
Theoretical analysis using mixture models proves that forward-KL objectives (SFT) inherently cause mass collapse of old knowledge, whereas reverse-KL (RL) preserves it, with forgetting limited only by distribution overlap.
Core Problem
Post-training procedures like SFT and RL induce catastrophic forgetting, but the specific mechanisms (weight collapse vs. parameter drift) are not theoretically quantified or distinguished.
Why it matters:
  • Practitioners use SFT and RL interchangeably for fine-tuning without understanding why SFT destroys old capabilities while RL might preserve them
  • Existing definitions of forgetting conflate 'ignoring the old task' (mass collapse) with 'corrupting the old task' (drift), hindering the design of targeted remedies
Concrete Example: When fine-tuning a model on new data only, an SFT objective (Forward-KL) forces the model to assign zero probability to old data regions ($eta o 0$) because the objective never sees them, even if the model is capable of representing both perfectly.
Key Novelty
Two-Mode Mixture Theory of Forgetting
  • Models the learner as a two-component mixture (Old vs. New) to analytically separate 'Mass Forgetting' (assigning zero weight to old tasks) from 'Old-Component Drift' (distorting old parameters)
  • Proves that Forward-KL (SFT) on new data inherently minimizes to zero old-task weight, while Reverse-KL (RL) updates preserve the weight and only drift due to distribution overlap (misassignment probability)
Breakthrough Assessment
8/10
Provides a rigorous theoretical foundation for a widely observed phenomenon (SFT forgets more than RL). The decomposition into mass collapse vs. drift is a valuable conceptual tool.
×