Residual Stream: The primary vector pathway in a Transformer where each layer adds its output to the existing representation, allowing decomposability
TextSpan: An algorithm that projects attention head outputs into a shared text-image space to assign human-readable labels to what the head encodes
CAV: Concept Activation Vectors—directions in activation space that correspond to specific concepts (e.g., 'gender'), usually found via linear classifiers
Mean Ablation: Replacing the output of a specific attention head with its average output across the dataset, neutralizing its input-specific signal while keeping static statistics
Cramér’s V: A statistical measure of association between two nominal variables (here, demographic group and predicted class), used to quantify bias magnitude
FACET: A benchmark dataset for fairness in computer vision containing images labeled with occupations and demographic attributes
Zero-shot: Performing a task (here, concept definition) without using explicit training examples, relying instead on pre-trained text embeddings