Context Faithfulness Hallucination: When a model retrieves the correct information but fails to use it, generating an answer that contradicts or ignores the retrieved context.
Attention Ratio: A normalized measure of how much attention a specific context token receives relative to the total attention on the context, used to filter out noise from high-frequency tokens.
Normalized Entropy: A metric measuring the uncertainty of the model's next-token prediction distribution; high entropy implies the model is unsure.
MSP: Maximum Softmax Probability—the probability score of the most likely token; used as a proxy for model confidence.
DAGCD: Dynamic Attention-Guided Context Decoding—the proposed method that adjusts output probabilities based on attention signals and uncertainty.
CAD: Context-Aware Decoding—a baseline method that contrasts logits from a context-aware model against a context-agnostic model.
Attention Sink: The phenomenon where attention heads disproportionately attend to specific tokens (like the start token or delimiters) regardless of relevance.
Greedy Decoding: A decoding strategy that selects the token with the highest probability at each step.