CPT: Continual Pretraining—training a pre-trained model on a new domain-specific corpus
SFT: Supervised Fine-Tuning—training a model on instruction-response pairs
Catastrophic Forgetting: The tendency of neural networks to lose previously learned knowledge (e.g., general math) when trained on new data (e.g., finance)
SLERP: Spherical Linear Interpolation—a method to blend two sets of weights by following the shortest path along a multi-dimensional sphere, preserving the magnitude (norm) better than simple averaging
SWCI: SNR-Weighted Change Intensity—a metric proposed in this paper that measures how much a parameter changed, weighted by its signal-to-noise ratio (importance)
SVDR: Singular Value Drop Ratio—a metric proposed in this paper measuring structural changes in a layer's information capacity via its singular values
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that freezes the main weights and trains small adapter matrices
EWC: Elastic Weight Consolidation—a regularization technique that penalizes changes to parameters important for previous tasks
Spectral Analysis: Analyzing the eigenvalues or singular values of weight matrices to understand their signal strength and redundancy