Causal Tracing: A technique to identify which hidden states significantly influence a model's output by corrupting inputs and restoring specific activations to recover the correct prediction
Average Indirect Effect (AIE): A metric quantifying the causal importance of a specific model component (like a hidden state) on the final prediction probability
Subject Token (ST): The token(s) in the query representing the entity about which the fact is being asked (e.g., 'Space Needle')
Attribute Token (AT): The token in the RAG context that contains the answer to the query (also referred to as the object)
Last Token (LT): The final token position in the input sequence, from which the next token prediction is generated
Attention Knockout: A method to test the importance of specific attention heads by masking (setting to negative infinity) the attention scores between specific token pairs
Parametric Memory: Knowledge stored in the model's fixed weights (parameters), typically accessed via MLPs
Residual Stream: The primary vector pathway in Transformers where information is accumulated across layers