SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

📝 Paper Summary

Factuality Decoding Hallucination Suppression

SLED improves LLM factuality by treating inference as an optimization process where final-layer logits are updated to align with latent knowledge extracted from early-layer contrasts.

Core Problem

LLMs often hallucinate because their final output distribution (logits) deviates from real-world facts, even when the model possesses the correct latent knowledge in its internal representations.

Why it matters:

Hallucinations undermine trust in LLMs for practical applications where factual accuracy is non-negotiable.
Existing solutions like Retrieval-Augmented Generation (RAG) require external databases, which can be costly or unavailable.
Prior decoding methods (like DoLa) struggle with selecting the optimal premature layer for contrast, leading to inconsistent performance.

Concrete Example: When an LLM is asked a factual question, its internal layers might encode the correct entity (e.g., 'Paris'), but the final layer's softmax distribution might assign a higher probability to an incorrect entity (e.g., 'London') due to superficial correlations. Standard decoding picks 'London', ignoring the internal evidence for 'Paris'.

Key Novelty

Self Logits Evolution Decoding (SLED)

Interprets the training process as 'logits evolution' towards truth and replicates this at inference time by treating the final logits as a variable to be optimized.
Estimates the 'true' distribution by contrasting final-layer logits with *all* early-layer logits, using the direction of this difference to approximate the gradient towards factual truth.
Updates the final logits using a single-step gradient descent approach based on this estimated factual distribution, balancing the original output with internal latent knowledge.

Architecture

The workflow of SLED. It depicts the process of extracting logits from early and final layers, estimating the latent factual distribution, and updating the final logits.

Evaluation Highlights

Consistent improvements in factual accuracy across diverse model families (Gemma, Qwen, Mixtral, GPT-OSS) and scales (1B to 45B).
Achieves state-of-the-art performance among layer-wise contrastive decoding methods on benchmarks like TruthfulQA and multiple-choice tasks.
Maintains natural language fluency and introduces negligible latency overhead compared to complex search-based methods.

Breakthrough Assessment

7/10

Offers a mathematically grounded perspective (logits evolution) on contrastive decoding that unifies previous heuristic approaches (like DoLa) and eliminates the need for manual layer selection.

⚙️ Technical Details

Problem Definition

Setting: Next-token prediction where the goal is to align the output distribution with the real-world factual distribution without external data.

Inputs: Input context (prefix) x

Outputs: Modified probability distribution for the next token

Pipeline Flow

Logit Computation (Calculate logits for final layer and all early layers)
Gradient Approximation (Estimate direction towards truth using layer contrasts)
Latent Distribution Estimation (Aggregate estimates into a target distribution)
Self-Evolution (Update final logits using the target distribution)

System Modules

Logit Computation

Compute logits for the final layer N and all preceding layers n

Model or implementation: Various LLMs (Gemma, Qwen, Mixtral)

Gradient Approximation (Latent Knowledge Extraction)

Identify which tokens align with the 'evolution' direction from early to late layers

Model or implementation: Mathematical Operation

Latent Distribution Aggregation (Latent Knowledge Extraction)

Combine layer-wise estimates into a single robust target distribution

Model or implementation: Weighted Average

Logit Updater

Adjust the final layer logits to minimize divergence from the estimated latent distribution

Model or implementation: Gradient Descent Step

Novel Architectural Elements

Ensemble-based layer contrasting: Instead of picking one 'premature' layer (like DoLa), SLED aggregates signals from *all* early layers weighted by their alignment with the final layer.
Gradient-based logit refinement: Formulates decoding as an optimization step minimizing KL divergence between current logits and an estimated 'truth' distribution.

Modeling

Base Model: Evaluated on Gemma-3, Qwen-3, GPT-OSS, Mixtral (Sizes: 1B to 45B)

Comparison to Prior Work

vs. DoLa: SLED ensembles *all* early layers rather than selecting one, avoiding performance degradation seen in DoLa when the candidate set is too large.
vs. DoLa: SLED uses a gradient-based update (soft evolution) rather than a hard subtraction/contrast.
vs. ITI: SLED operates on logits (output space) rather than internal attention activations (feature space) and requires no probing dataset [not cited in paper].

Limitations

Computational overhead of computing logits for all layers (mitigated by only updating top-k tokens)
Relies on the assumption that the 'evolution' direction (early -> late) generally points towards truth, which may not hold for all hallucination types
Does not use external knowledge, so it cannot correct hallucinations where the model lacks the underlying knowledge entirely

Reproducibility

No replication artifacts mentioned in the paper. Code URL is not provided. The method is described mathematically, allowing for reimplementation by experts, but specific hyperparameters for the 'soft estimation' (squaring mean values) and 'evolution scale' (top-k selection) would require tuning.

📊 Experiments & Results

Evaluation Setup

Zero-shot factuality evaluation across diverse tasks

Benchmarks:

TruthfulQA (Multiple-choice factuality)
FACTOR (Factuality evaluation (News/Wiki))
StrategyQA (Reasoning / Question Answering)
GSM8K (Chain-of-Thought Reasoning)

Metrics:

Accuracy (Acc)
Factual Accuracy
Statistical methodology: Not explicitly reported in the paper

Experiment Figures

A visual demonstration of how SLED downweights incorrect tokens compared to standard decoding.

Main Takeaways

SLED consistently improves factual accuracy compared to standard decoding and DoLa across varied model sizes (1B-45B).
The method is robust to the choice of layers, unlike DoLa which is sensitive to the candidate layer set size.
The 'soft estimation' strategy (using a target distribution) outperforms 'hard estimation' (picking a single token), suggesting preserving uncertainty is beneficial.
SLED works synergistically with other decoding strategies, capable of being combined for further gains.

📚 Prerequisite Knowledge

Prerequisites

Transformer architecture (layers, hidden states, logits)
Contrastive Decoding
Kullback-Leibler (KL) Divergence
Gradient Descent

Key Terms

Logits: The raw, unnormalized scores output by the final layer of a neural network before the softmax function is applied.

KL Divergence: A statistical measure quantifying how one probability distribution differs from a second, reference probability distribution.

DoLa: Decoding by Contrasting Layers—a prior method that contrasts the final layer with a specific early layer to amplify factual signals.

Softmax: A function that converts a vector of numbers (logits) into a vector of probabilities that sum to one.

Latent Knowledge: Factual information that is implicitly encoded in the model's internal parameters and hidden states but may not be correctly surfaced in the final output.

Soft Targets: Probability distributions used as targets during training (or here, evolution) that are not simple 0/1 indicators, allowing for uncertainty and richer information.