Superposition: The phenomenon where neural networks represent more features than they have dimensions by storing them non-orthogonally
Interference: The noise introduced to a feature's reconstruction by the activation of other non-orthogonal features sharing the same subspace
Constructive Interference: A regime where interference from correlated features is positively correlated with the target signal, aiding reconstruction instead of degrading it
BOWS: Bag-of-Words Superposition—a dataset and framework using binary word occurrence vectors from text to study superposition with realistic correlations
Linear Superposition: A regime where superposed features can be recovered with high accuracy using a linear decoder, typically due to low-rank data structure
ReLU: Rectified Linear Unit—an activation function f(x) = max(0, x) used to filter out negative interference
SAE: Sparse Autoencoder—a model used to disentangle superposed representations into interpretable features
Antipodal pairs: A geometric arrangement where two features share a dimension but point in opposite directions (1 and -1), allowing a ReLU to separate them