MUI: Model Utilization Index—a metric calculating the ratio of activated neurons or features utilized to complete a task relative to total model capacity
SAE: Sparse Auto-Encoder—a technique that decomposes neural activations into interpretable, mono-semantic features
FFN: Feed-Forward Network—the sub-layer in Transformer blocks where neurons process information, often associated with knowledge storage
Utility Law: The empirical observation that MUI has an inverse logarithmic relationship with model performance (lower effort = higher performance)
Neuron Activation Patching: A technique to determine a neuron's causal effect by swapping its activation states and observing changes in output
Polysemanticity: The phenomenon where a single neuron responds to multiple unrelated concepts, complicating interpretation
MoE: Mixture-of-Experts—an architecture that activates only a subset of parameters per token, naturally optimizing for lower utilization
Data Contamination: When test data leaks into the training set, artificially inflating performance scores