CIL: Class-Incremental Learning—training a model on a sequence of tasks where new classes are added over time, without forgetting old ones
ERM: Empirical Risk Minimization—the standard training principle of minimizing average error on training data, which often leads to learning 'shortcut' features
PNS: Probability of Necessity and Sufficiency—a causal metric quantifying the probability that a cause is both necessary (outcome wouldn't happen without it) and sufficient (outcome happens with it) for an effect
CPNS: Causal PNS—the paper's proposed extension of PNS to Continual Learning, splitting it into Intra-task PNS (completeness) and Inter-task PNS (separability)
CKA: Centered Kernel Alignment—a similarity index used to measure how similar the representations (features) learned by two different networks are
Feature Collision: When the feature representation of a new class inadvertently overlaps with the frozen feature space of an old class, causing confusion
DER: Dynamically Expandable Representation—a baseline CIL method that freezes old feature extractors and adds a new one for each task