CrAM: Credibility-aware Attention Modification—the proposed method to adjust attention weights based on document credibility
Credibility score: A probability score indicating the likelihood that a document does not contain misinformation
Indirect Effect (IE): A metric from causal tracing quantifying the contribution of a specific model component (e.g., attention head) to a model's output probability
Attention head: A component in Transformer models that learns to focus on different parts of the input sequence
Causal tracing: A technique to locate which parts of a neural network are responsible for specific factual predictions by adding noise and observing output changes
SFT: Supervised Fine-Tuning—training a pre-trained model on a labeled dataset to adapt it to a specific task
EM: Exact Match—an evaluation metric measuring if the generated answer exactly matches the ground truth
F1 score: A metric balancing precision and recall, measuring word overlap between the prediction and ground truth
Naive RAG: Standard RAG pipeline that retrieves documents and generates answers without special handling for credibility or misinformation