_comment: REQUIRED: Define ALL technical terms, acronyms, and method names used ANYWHERE in the entire summary. After drafting the summary, perform a MANDATORY POST-DRAFT SCAN: check every section individually (Core.one_sentence_thesis, evaluation_highlights, core_problem, Technical_details, Experiments.key_results notes, Figures descriptions and key_insights). HIGH-VISIBILITY RULE: Terms appearing in one_sentence_thesis, evaluation_highlights, or figure key_insights MUST be defined—these are the first things readers see. COMMONLY MISSED: PPO, DPO, MARL, dense retrieval, silver labels, cosine schedule, clipped surrogate objective, Top-k, greedy decoding, beam search, logit, ViT, CLIP, Pareto improvement, BLEU, ROUGE, perplexity, attention heads, parameter sharing, warm start, convex combination, sawtooth profile, length-normalized attention ratio, NTP. If in doubt, define it.
CL: Continual Learning—learning from a stream of data/tasks without forgetting previously learned information
Catastrophic Forgetting: The tendency of neural networks to drastically forget previously learned information upon learning new information
Rehearsal/Experience Replay: Storing a small subset of data from past tasks in a buffer and mixing it with new data during training to prevent forgetting
FGSM: Fast Gradient Sign Method—a single-step gradient-based attack that adds noise in the direction of the loss gradient to fool a model
Memory Overfitting: When a CL model memorizes the limited samples in the rehearsal buffer, losing the ability to generalize to the original class distribution
CKA: Central Kernel Alignment—a similarity metric used to compare the representations (features) learned by two different neural networks
DER: Dynamically Expandable Representation—a SOTA CL method that expands the model architecture for new tasks
FOSTER: Feature Boosting and Compression—a two-stage CL method involving model expansion and compression
CIFAR10-C: A variant of the CIFAR-10 dataset corrupted with various natural noises (e.g., blur, snow, noise) to test robustness
ACA: Average Classification Accuracy—the mean accuracy across all learned tasks after training is complete