Jianyi Zhang, Da-Cheng Juan, Cyrus Rashtchian, Chun-Sung Ferng, Heinrich Jiang, Yiran Chen
Duke University,
Google Research
arXiv
(2024)
FactualityReasoningBenchmark
๐ Paper Summary
Factuality DecodingHallucination Suppression
SLED improves LLM factuality by treating inference as an optimization process where final-layer logits are updated to align with latent knowledge extracted from early-layer contrasts.
Core Problem
LLMs often hallucinate because their final output distribution (logits) deviates from real-world facts, even when the model possesses the correct latent knowledge in its internal representations.
Why it matters:
Hallucinations undermine trust in LLMs for practical applications where factual accuracy is non-negotiable.
Existing solutions like Retrieval-Augmented Generation (RAG) require external databases, which can be costly or unavailable.
Prior decoding methods (like DoLa) struggle with selecting the optimal premature layer for contrast, leading to inconsistent performance.
Concrete Example:When an LLM is asked a factual question, its internal layers might encode the correct entity (e.g., 'Paris'), but the final layer's softmax distribution might assign a higher probability to an incorrect entity (e.g., 'London') due to superficial correlations. Standard decoding picks 'London', ignoring the internal evidence for 'Paris'.
Key Novelty
Self Logits Evolution Decoding (SLED)
Interprets the training process as 'logits evolution' towards truth and replicates this at inference time by treating the final logits as a variable to be optimized.
Estimates the 'true' distribution by contrasting final-layer logits with *all* early-layer logits, using the direction of this difference to approximate the gradient towards factual truth.
Updates the final logits using a single-step gradient descent approach based on this estimated factual distribution, balancing the original output with internal latent knowledge.
Architecture
The workflow of SLED. It depicts the process of extracting logits from early and final layers, estimating the latent factual distribution, and updating the final logits.
Evaluation Highlights
Consistent improvements in factual accuracy across diverse model families (Gemma, Qwen, Mixtral, GPT-OSS) and scales (1B to 45B).
Achieves state-of-the-art performance among layer-wise contrastive decoding methods on benchmarks like TruthfulQA and multiple-choice tasks.
Maintains natural language fluency and introduces negligible latency overhead compared to complex search-based methods.
Breakthrough Assessment
7/10
Offers a mathematically grounded perspective (logits evolution) on contrastive decoding that unifies previous heuristic approaches (like DoLa) and eliminates the need for manual layer selection.
โ๏ธ Technical Details
Problem Definition
Setting: Next-token prediction where the goal is to align the output distribution with the real-world factual distribution without external data.
Inputs: Input context (prefix) x
Outputs: Modified probability distribution for the next token
Pipeline Flow
Logit Computation (Calculate logits for final layer and all early layers)
Gradient Approximation (Estimate direction towards truth using layer contrasts)
Latent Distribution Estimation (Aggregate estimates into a target distribution)
Self-Evolution (Update final logits using the target distribution)
System Modules
Logit Computation
Compute logits for the final layer N and all preceding layers n
Model or implementation: Various LLMs (Gemma, Qwen, Mixtral)
Identify which tokens align with the 'evolution' direction from early to late layers
Model or implementation: Mathematical Operation
Latent Distribution Aggregation (Latent Knowledge Extraction)
Combine layer-wise estimates into a single robust target distribution
Model or implementation: Weighted Average
Logit Updater
Adjust the final layer logits to minimize divergence from the estimated latent distribution
Model or implementation: Gradient Descent Step
Novel Architectural Elements
Ensemble-based layer contrasting: Instead of picking one 'premature' layer (like DoLa), SLED aggregates signals from *all* early layers weighted by their alignment with the final layer.
Gradient-based logit refinement: Formulates decoding as an optimization step minimizing KL divergence between current logits and an estimated 'truth' distribution.
Modeling
Base Model: Evaluated on Gemma-3, Qwen-3, GPT-OSS, Mixtral (Sizes: 1B to 45B)
Comparison to Prior Work
vs. DoLa: SLED ensembles *all* early layers rather than selecting one, avoiding performance degradation seen in DoLa when the candidate set is too large.
vs. DoLa: SLED uses a gradient-based update (soft evolution) rather than a hard subtraction/contrast.
vs. ITI: SLED operates on logits (output space) rather than internal attention activations (feature space) and requires no probing dataset [not cited in paper].
Limitations
Computational overhead of computing logits for all layers (mitigated by only updating top-k tokens)
Relies on the assumption that the 'evolution' direction (early -> late) generally points towards truth, which may not hold for all hallucination types
Does not use external knowledge, so it cannot correct hallucinations where the model lacks the underlying knowledge entirely
Reproducibility
No replication artifacts mentioned in the paper. Code URL is not provided. The method is described mathematically, allowing for reimplementation by experts, but specific hyperparameters for the 'soft estimation' (squaring mean values) and 'evolution scale' (top-k selection) would require tuning.
๐ Experiments & Results
Evaluation Setup
Zero-shot factuality evaluation across diverse tasks
Benchmarks:
TruthfulQA (Multiple-choice factuality)
FACTOR (Factuality evaluation (News/Wiki))
StrategyQA (Reasoning / Question Answering)
GSM8K (Chain-of-Thought Reasoning)
Metrics:
Accuracy (Acc)
Factual Accuracy
Statistical methodology: Not explicitly reported in the paper
Experiment Figures
A visual demonstration of how SLED downweights incorrect tokens compared to standard decoding.
Main Takeaways
SLED consistently improves factual accuracy compared to standard decoding and DoLa across varied model sizes (1B-45B).
The method is robust to the choice of layers, unlike DoLa which is sensitive to the candidate layer set size.
The 'soft estimation' strategy (using a target distribution) outperforms 'hard estimation' (picking a single token), suggesting preserving uncertainty is beneficial.
SLED works synergistically with other decoding strategies, capable of being combined for further gains.
Logits: The raw, unnormalized scores output by the final layer of a neural network before the softmax function is applied.
KL Divergence: A statistical measure quantifying how one probability distribution differs from a second, reference probability distribution.
DoLa: Decoding by Contrasting Layersโa prior method that contrasts the final layer with a specific early layer to amplify factual signals.
Softmax: A function that converts a vector of numbers (logits) into a vector of probabilities that sum to one.
Latent Knowledge: Factual information that is implicitly encoded in the model's internal parameters and hidden states but may not be correctly surfaced in the final output.
Soft Targets: Probability distributions used as targets during training (or here, evolution) that are not simple 0/1 indicators, allowing for uncertainty and richer information.