FoMo-in-Flux: Foundation-Models-in-Flux, the proposed benchmark containing 63 datasets for simulating realistic continual pretraining streams
MAFs: Memory-Adjusted FLOPs—a metric combining FLOP counts with peak device memory usage to budget compute resources fairly across methods
Model Merging: Technique of linearly combining the weights of a finetuned model and the original pretrained model to balance new knowledge and old capabilities
Plasticity: The ability of a model to learn new information from the current task
Stability: The ability of a model to retain previously learned information (prevent forgetting)
LoRA: Low-Rank Adaptation—a parameter-efficient finetuning method that injects trainable low-rank decomposition matrices into model layers
EWC: Elastic Weight Consolidation—a regularization-based continual learning method that penalizes changes to important parameters
Zero-Shot Retention: Performance of the continually updated model on a held-out set of datasets it was never trained on
Knowledge Accumulation: Performance of the model on the specific downstream tasks it has been adapted to sequentially