When to Speak, When to Abstain: Contrastive Decoding with Abstention

📝 Paper Summary

Contrastive Decoding Hallucination Suppression

CDA is a training-free decoding method that estimates the uncertainty of parametric and contextual knowledge to dynamically weight their usage or trigger abstention when neither is relevant.

Core Problem

Existing LLMs often hallucinate answers when lacking both internal (parametric) and external (contextual) knowledge, and current decoding methods fail to handle the scenario where neither source is reliable.

Why it matters:

Compelling models to answer when they lack information leads to severe hallucinations and loss of user trust
Current Context-Aware Contrastive Decoding (CCD) methods assume at least one knowledge source (internal or external) is always correct, failing in 'unanswerable' edge cases
Real-world applications require models to admit ignorance rather than fabricate plausible-sounding falsehoods

Concrete Example: When asked a question where the model's pre-training data is outdated (low parametric relevance) and the retrieved context is irrelevant (low contextual relevance), standard models will force an incorrect answer. CDA detects high uncertainty in both and outputs an abstention response.

Key Novelty

Contrastive Decoding with Abstention (CDA)

Introduces an explicit 'abstention' output distribution into the contrastive decoding equation
Dynamically calculates weights for parametric, contextual, and abstention distributions based on entropy-based uncertainty estimates
Increases the weight of the abstention distribution specifically when the model is uncertain about both its internal knowledge and the provided context

Architecture

The overall workflow of Contrastive Decoding with Abstention (CDA), illustrating how weights are dynamically calculated and how distributions are combined.

Evaluation Highlights

Achieves higher AUC (Area Under the Curve) scores for abstention decisions compared to self-consistency and logit-based baselines across NQ, HotpotQA, and TriviaQA
Maintains or improves generation accuracy on answerable queries while effectively abstaining on unanswerable ones, outperforming standard CCD methods
Demonstrates robustness across four different LLMs (Llama-3, Mistral, etc.) without requiring any model training or fine-tuning

Breakthrough Assessment

7/10

A clever, lightweight, training-free extension to contrastive decoding that solves a critical safety issue (abstention). High practical utility for RAG systems, though heavily reliant on the quality of uncertainty estimation.

⚙️ Technical Details

Problem Definition

Setting: Open-domain Question Answering (QA) with potentially irrelevant retrieval contexts

Inputs: Query x, retrieved context c (which may be relevant or irrelevant)

Outputs: Answer y (if knowledge is available) or an abstention response (e.g., 'I don't know')

Pipeline Flow

Compute Parametric Distribution (prompt with query only)
Compute Contextual Distribution (prompt with query + context)
Compute Abstention Distribution (prompt with instruction to abstain)
Estimate Uncertainty (Entropy) for Parametric and Contextual distributions
Calibrate Uncertainty (subtract bias using null prompt)
Calculate Adaptive Weights (inverse to uncertainty)
Aggregate Distributions (weighted sum of the three distributions)

System Modules

Distribution Computer

Generate next-token logits for three scenarios: parametric (no context), contextual (with context), and abstention (explicit instruction)

Model or implementation: Llama-3-8B-Instruct (or Mistral-7B-Instruct, etc.)

Uncertainty Estimator

Calculate entropy of the output distributions to quantify knowledge relevance

Model or implementation: Mathematical operation

Bias Calibrator

Normalize entropy scores using a 'content-free' null prompt to handle intrinsic model bias

Model or implementation: Mathematical operation

Adaptive Weighter

Assign weights to the three distributions based on calibrated uncertainty

Model or implementation: Softmax-based function

Novel Architectural Elements

Three-way contrastive formulation: Integrating an explicit 'abstention distribution' alongside parametric and contextual distributions
Adaptive weighting mechanism: Weights are dynamically derived from real-time entropy estimates of the generation step, rather than fixed hyperparameters

Modeling

Base Model: Evaluated on Llama-3-8B-Instruct, Mistral-7B-Instruct-v0.3, Llama-2-7B-Chat, Llama-2-13B-Chat

Training Method: Training-free decoding strategy

Compute: Requires 3 forward passes per decoding step (Parametric, Contextual, Abstention) + additional passes for bias calibration (can be pre-computed or batched)

Comparison to Prior Work

vs. CAD: CAD assumes context is useful; CDA handles irrelevant context by weighing in parametric knowledge or abstention
vs. Self-Consistency: CDA is a decoding strategy that alters the generation probability directly, whereas SC is a sampling/aggregation strategy
vs. Adaptive-RAG [not cited in paper]: Adaptive-RAG trains a classifier to decide retrieval; CDA does this continuously during decoding without training

Limitations

Inference cost is high due to multiple forward passes (parametric, contextual, abstention) per token
Relies on the assumption that entropy is a reliable proxy for correctness, which may not hold for all models or domains
Calibration requires careful selection of 'null' prompts which can vary by task

Reproducibility

Code: https://github.com/heyjoonkim/CDA

Code is publicly available. The method is training-free, relying on standard pre-trained LLMs. The paper provides detailed prompt templates for parametric, contextual, and abstention generation in Figure 3. Datasets (NQ, HotpotQA, TriviaQA) are public benchmarks.

📊 Experiments & Results

Evaluation Setup

Controlled testbed using MRQA datasets (NQ, HotpotQA, TriviaQA) where knowledge availability is explicitly manipulated (Valid/Invalid Parametric, Valid/Invalid Context)

Benchmarks:

Natural Questions (NQ) (Open-domain QA)
HotpotQA (Multi-hop QA)
TriviaQA (Trivia QA)

Metrics:

EM (Exact Match) for generation accuracy
AUC (Area Under the Curve) for abstention performance
Jaccard Similarity
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Abstention performance (AUC) comparisons showing CDA's ability to distinguish answerable from unanswerable queries compared to confidence-based baselines.
Natural Questions	AUC	73.2	83.5	+10.3
HotpotQA	AUC	73.7	82.4	+8.7
TriviaQA	AUC	72.4	79.2	+6.8
Generation performance (EM) on the 'Answerable' subset of the testbed, demonstrating that adding abstention capabilities does not degrade standard QA performance.
Natural Questions	EM	45.0	47.7	+2.7
HotpotQA	EM	29.9	32.0	+2.1
Comparison against training-based abstention methods (R-Tuning) to show the efficacy of the training-free approach.
Natural Questions	AUC	70.2	79.1	+8.9

Experiment Figures

The four knowledge scenarios addressed by the testbed: (1) Relevant Parametric & Contextual, (2) Relevant Parametric only, (3) Relevant Contextual only, (4) No Relevant Knowledge (Unanswerable).

Performance comparison (EM for answerable, Abstention Rate for unanswerable) on the NQ dataset across Llama-3, Mistral, and Llama-2.

Main Takeaways

CDA consistently outperforms confidence-based baselines (Logit, Self-Consistency) in abstention tasks across multiple datasets and models.
The method maintains competitive or superior generation accuracy on answerable queries, proving that the abstention mechanism does not interfere with valid knowledge retrieval.
Ablation studies confirm the necessity of bias calibration; without it, entropy estimates are unreliable.
CDA generalizes well to RAG settings where the retriever may fetch irrelevant documents, effectively filtering them out via the adaptive weighting mechanism.

📚 Prerequisite Knowledge

Prerequisites

Contrastive Decoding (CD) mechanics
Retrieval-Augmented Generation (RAG)
Entropy as an uncertainty measure in LLMs

Key Terms

Parametric Knowledge: Knowledge stored within the model's pre-trained weights

Contextual Knowledge: External knowledge provided in the prompt (e.g., from retrieval)

Contrastive Decoding: A decoding strategy that modifies the output probability distribution by contrasting a strong model/prompt against a weak one to emphasize desirable features

CCD: Context-aware Contrastive Decoding—variants of CD that contrast distributions with and without context to highlight context-grounded information

Abstention: The model's ability to refuse to answer when it lacks sufficient information

Entropy: A measure of the unpredictability or uncertainty of a probability distribution; high entropy implies high uncertainty

SBERT: Sentence-BERT—a modification of the BERT network that uses siamese networks to derive semantically meaningful sentence embeddings