Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models

📝 Paper Summary

Hallucination suppression Modularized RAG pipeline

DAGCD mitigates context faithfulness hallucinations by using attention distributions to identify utilized context tokens and boosting their probability when the model exhibits high uncertainty.

Core Problem

RAG models often suffer from 'Context Faithfulness Hallucinations,' where they ignore retrieved information and generate unfaithful answers despite having the correct context available.

Why it matters:

Hallucinations undermine trust in RAG systems, especially in critical domains where factual accuracy is paramount.
Existing decoding methods like CAD and COIECD require multiple decoding passes, increasing computational cost, or lack interpretability regarding why context is ignored.

Concrete Example: In a QA scenario, a model might retrieve a document stating 'The team was formerly known as X,' but still generate 'The team was formerly known as Y.' The paper shows the model actually attends to 'X' (ranks it in top-10) but fails to assign it the highest probability due to low confidence.

Key Novelty

Dynamic Attention-Guided Context Decoding (DAGCD)

Detects which context tokens the model is internally 'looking at' using a classifier trained on attention weights (Attention Ratio).
Dynamically boosts the probability of these utilized context tokens during generation, but only when the model's intrinsic uncertainty (entropy) is high.
Operates in a single decoding pass without requiring external models or multiple forward passes, unlike contrastive decoding methods.

Architecture

The workflow of Dynamic Attention-Guided Context Decoding (DAGCD).

Evaluation Highlights

Achieves up to +17.67% improvement in Exact Match (EM) over greedy decoding on pre-trained Llama-2-7B across 6 QA datasets.
Outperforms strong baselines like CAD (Contrastive Decoding) and DoLa while being more computationally efficient (single-pass).
Demonstrates robust generalization across different model families (Llama-2, Llama-3, Mistral) and sizes (7B, 13B).

Breakthrough Assessment

7/10

Strong empirical results and a lightweight, interpretable single-pass solution to a critical RAG problem. It smartly leverages internal attention signals rather than just output logits.

⚙️ Technical Details

Problem Definition

Setting: Open-book Question Answering where a model must generate an answer 'a' given a question 'q' and retrieved context 'C'.

Inputs: Query tokens and retrieved context tokens.

Outputs: Generated answer sequence aligned with the retrieved context.

Pipeline Flow

Feature Extraction: Calculate Attention Ratio for context tokens from top-K heads
Utilization Detection: Logistic Regression classifier identifies utilized tokens
Distribution Construction: Create a utilization distribution (U) over context tokens
Dynamic Adjustment: Mix original logits with utilization distribution based on model uncertainty

System Modules

Attention Feature Extractor (Context Utilization Detection)

Extracts attention weights from specific heads and normalizes them into Attention Ratios to remove noise.

Model or implementation: Base LLM (e.g., Llama-2-7B) internal attention heads

Context Utilization Detector (Context Utilization Detection)

Classifies whether a context token is being 'utilized' by the model to answer the question.

Model or implementation: Logistic Regression (LR) Classifier

Probability Adjuster

Modifies the output distribution by boosting probabilities of utilized context tokens, scaled by entropy.

Model or implementation: Mathematical blending function

Novel Architectural Elements

Attention-guided logit adjustment: Directly modifying output probabilities using a learned attention-based signal (Attention Ratio) in a single pass.
Uncertainty-adaptive mixing: The degree of adjustment is dynamically scaled by the token-level entropy (alpha * H(P)), intervening more when the model is uncertain.

Modeling

Base Model: Evaluated on Llama-2 (7B, 13B), Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.2

Training Method: Training light-weight Logistic Regression probes on frozen LLM attention features

Adaptation: None (Inference-time intervention)

Trainable Parameters: Logistic Regression weights (negligible compared to LLM)

Training Data:

Constructed from MrQA training set
Positive samples: Context tokens matching gold answer in cases where context corrected the model
Negative samples: Context tokens not matching gold answer
Data efficient: Works well with just 100 samples

Key Hyperparameters:

top_K_heads: Selected based on validation performance (specific number not fixed, varies by model)
alpha: Scaling factor for entropy adjustment (model specific)
top_R: Rank restriction for utilization distribution (e.g., top-10 or top-30)

Compute: Single A100 GPU used for experiments; Inference time ~1.05x of Greedy Decoding (negligible overhead)

Comparison to Prior Work

vs. CAD: DAGCD is single-pass (CAD requires two forward passes) and uses internal attention signals rather than just output logits.
vs. COIECD: DAGCD is single-pass and uses attention interpretability; COIECD relies on contrastive decoding.
vs. DoLa: DAGCD specifically targets context utilization in RAG, whereas DoLa is a general factuality enhancement method.
+ 1 more
vs. ICD [not cited in paper]: DAGCD is soft-guidance based on attention and entropy, not hard constraints derived from instructions.

Limitations

Depends on the quality of retrieved context; if context is irrelevant/wrong, boosting it might hurt (though top-rank restriction helps).
Requires access to attention weights, which may not be possible with black-box API models.
The scaling factor alpha and top-K heads are hyperparameters that may need tuning per model.

Reproducibility

Code: https://github.com/uestc-huangyw/DAGCD

publicly available (https://github.com/uestc-huangyw/DAGCD). The paper provides details on data construction for the LR probe and hyperparameters. The method is training-free for the LLM itself, relying only on a small probe.

📊 Experiments & Results

Evaluation Setup

Open-book QA using 6 datasets from MrQA shared task.

Benchmarks:

Natural Questions (NQ) (Open-domain QA)
TriviaQA (Open-domain QA)
HotpotQA (Multi-hop QA)
NewsQA (Reading Comprehension)
SearchQA (QA from search results)
SQuAD (Reading Comprehension)

Metrics:

Exact Match (EM)
F1 Score
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Main results on Pretrained Models (Llama-2-7B, Llama-2-13B) show substantial gains over greedy decoding and baselines.
Average (6 datasets)	Exact Match (EM)	29.76	47.43	+17.67
Average (6 datasets)	Exact Match (EM)	35.32	48.96	+13.64
Comparison with other decoding strategies on Llama-2-7B shows DAGCD outperforms multi-pass methods.
Average (6 datasets)	Exact Match (EM)	32.19	47.43	+15.24
Average (6 datasets)	Exact Match (EM)	30.58	47.43	+16.85
Results on Instruction-Tuned Models (Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.2) show smaller but positive gains.
Average (6 datasets)	Exact Match (EM)	58.12	60.37	+2.25
Average (6 datasets)	Exact Match (EM)	52.88	54.76	+1.88

Experiment Figures

Analysis of entropy and token ranking for correct vs. wrong answers.

AUC of the attention probe vs. training size.

Main Takeaways

DAGCD significantly improves context faithfulness, especially on base pre-trained models where hallucinations are more frequent (up to ~17% gain).
The method is extremely data-efficient; the attention probe trains effectively with only 100 samples.
The attention mechanism contains strong, data-independent signals about context utilization that generalize across domains.
DAGCD maintains computational efficiency (single-pass), avoiding the 2x cost of contrastive methods like CAD.

📚 Prerequisite Knowledge

Prerequisites

Transformer attention mechanisms (query, key, value)
Decoding strategies (Greedy, Sampling)
Concept of Entropy in probability distributions
Logistic Regression

Key Terms

Context Faithfulness Hallucination: When a model retrieves the correct information but fails to use it, generating an answer that contradicts or ignores the retrieved context.

Attention Ratio: A normalized measure of how much attention a specific context token receives relative to the total attention on the context, used to filter out noise from high-frequency tokens.

Normalized Entropy: A metric measuring the uncertainty of the model's next-token prediction distribution; high entropy implies the model is unsure.

MSP: Maximum Softmax Probability—the probability score of the most likely token; used as a proxy for model confidence.

DAGCD: Dynamic Attention-Guided Context Decoding—the proposed method that adjusts output probabilities based on attention signals and uncertainty.

CAD: Context-Aware Decoding—a baseline method that contrasts logits from a context-aware model against a context-agnostic model.

Attention Sink: The phenomenon where attention heads disproportionately attend to specific tokens (like the start token or delimiters) regardless of relevance.

Greedy Decoding: A decoding strategy that selects the token with the highest probability at each step.