DeLTa: A Decoding Strategy based on Logit Trajectory Prediction Improves Factuality and Reasoning Ability

📝 Paper Summary

Hallucination suppression Decoding strategies

DeLTa improves LLM factuality and reasoning by treating layer-wise logits as a time series and using linear regression to predict an extrapolated 'virtual layer' distribution where correct tokens are more probable.

Core Problem

LLMs frequently generate hallucinations and logical errors because standard decoding relies on the final layer's logits, which may not fully capture the model's internal confidence evolution.

Why it matters:

Hallucinations pose severe risks in high-stakes fields like medicine and law where factual accuracy is paramount
Existing mitigation methods (dataset selection, loss modification) require expensive retraining or additional data
Current inference-time methods like DoLa rely on fixed layer buckets or simple differencing, potentially missing the continuous trend of information refinement across layers

Concrete Example: When answering a factual question, an LLM might assign high probability to a plausible but incorrect entity in early layers. While the probability of the correct entity rises in later layers, standard decoding might still pick the wrong one if the final layer isn't decisive enough. DeLTa extrapolates this rising trend to a virtual layer where the correct token dominates.

Key Novelty

Decoding by Logit Trajectory (DeLTa)

Treats the sequence of logits from intermediate layers to the final layer as a time-series trajectory
Applies linear regression to this trajectory to predict logits at a hypothetical 'virtual layer' beyond the model's actual depth
Leverages the observation that correct token probabilities tend to increase linearly across higher layers, amplifying this signal for better selection

Architecture

Conceptual diagram of DeLTa. It illustrates how logits for a correct token (e.g., 'Paris') increase across layers while incorrect ones fluctuate or decrease. DeLTa fits a line to these logits and extrapolates to a virtual layer to select the correct token.

Evaluation Highlights

+4.9% improvement on TruthfulQA (%True*Info) using Llama-3.1-8B compared to raw model output
+8.1% accuracy gain on StrategyQA using Llama-3.1-8B, enhancing reasoning capabilities without retraining
+7.3% accuracy improvement on GSM8K using Llama-3.1-8B, demonstrating gains in chain-of-thought reasoning

Breakthrough Assessment

7/10

Offers a computationally lightweight, training-free method that consistently improves both factuality and reasoning across multiple models. It refines the intuition of prior work (DoLa) into a more general regression framework.

⚙️ Technical Details

Problem Definition

Setting: Next-token prediction in an N-layer Transformer where we seek to maximize the probability of factually correct tokens

Inputs: Input token sequence x_<t

Outputs: Next token x_t selected based on extrapolated logits

Pipeline Flow

Forward Pass (compute logits at all layers)
Layer Selection (choose start layer N_mid)
Logit Extraction (collect logits from N_mid to N)
Trajectory Regression (fit linear model per token)
Extrapolation (predict logits at virtual layer L)
Candidate Filtering (restrict to top candidates)
Softmax & Sampling (generate next token)

System Modules

Logit Extractor

Obtains the logit vectors from intermediate layer N_mid up to the final layer N using Logit Lens

Model or implementation: Base LLM (e.g., Llama-3.1-8B)

Trajectory Regressor (Trajectory Prediction)

Fits a linear regression model (logit ~ layer_index) for each token in the vocabulary (or candidate set) to capture the trend

Model or implementation: Ordinary Least Squares (OLS) Linear Regression

Extrapolator (Trajectory Prediction)

Calculates the predicted logit at a virtual layer L (usually > N) to amplify the signal of the correct token

Model or implementation: Linear Equation

Candidate Filter

Restricts the final distribution to a dynamic candidate set (V_head) to reduce noise from irrelevant tokens

Model or implementation: Adaptive Thresholding

Novel Architectural Elements

Inference-time modification replacing the final layer logits with linearly extrapolated logits derived from a multi-layer trajectory

Modeling

Base Model: Evaluated on Qwen2.5-7B, Mistral-7B-v0.1, and Llama-3.1-8B

Training Method: Training-free decoding strategy

Compute: Inference only. Latency increases approx 1.4x for Qwen2.5-7B compared to baseline due to regression computation.

Reproducibility

Code: https://github.com/githubhyz/DeLTa

📊 Experiments & Results

Evaluation Setup

Zero-shot or few-shot (6-shot) generation on open-ended and multiple-choice QA tasks

Benchmarks:

TruthfulQA (Factual accuracy in open QA)
StrategyQA (Multi-hop reasoning)
GSM8K (Math word problems (Reasoning))
TriviaQA (Closed-book QA)
Natural Questions (NQ) (Closed-book QA)

Metrics:

%True*Info (TruthfulQA)
Exact Match / Accuracy (TriviaQA, NQ, StrategyQA, GSM8K)
Statistical methodology: Not explicitly reported in the paper

Key Results

Benchmark	Metric	Baseline	This Paper	Δ
Performance on Factuality benchmarks (TruthfulQA, TriviaQA, Natural Questions) showing consistent gains over baselines.
TruthfulQA	%True*Info	44.0	48.9	+4.9
TriviaQA	Accuracy	39.1	44.1	+5.0
Natural Questions	Accuracy	11.5	13.0	+1.5
Performance on Reasoning benchmarks (StrategyQA, GSM8K) showing significant improvements, particularly where prior methods struggled.
GSM8K	Accuracy	42.8	50.1	+7.3
StrategyQA	Accuracy	57.8	65.9	+8.1
GSM8K	Accuracy	31.0	38.2	+7.2

Experiment Figures

The mean coefficient of determination (R^2) of the logit trajectory linear regression at different layer depths across three models.

Main Takeaways

DeLTa consistently improves factuality (TruthfulQA) and reasoning (GSM8K) across multiple model families (Llama-3, Qwen2.5, Mistral), whereas baselines like DoLa are inconsistent (sometimes hurting performance).
Analysis of R^2 values confirms that logit trajectories become highly linear in the upper layers of Transformers (R^2 ~ 0.9), validating the linear regression approach.
The method is effective without requiring any model training or external data, though it incurs a moderate computational cost during inference.
Filtering alone (applying the candidate set V_head without regression) provides some gain, but DeLTa's regression provides significant additional improvement, proving the value of the trajectory prediction.

📚 Prerequisite Knowledge

Prerequisites

Transformer architecture (layers, logits)
Logit Lens concept (inspecting intermediate layers)
Linear regression (least squares)

Key Terms

Logit Lens: A technique to project hidden states from intermediate Transformer layers into the vocabulary space to view the model's prediction at that specific layer

DoLa: Decoding by Contrasting Layers—a baseline method that amplifies factual knowledge by contrasting logits from late layers against early layers

TruthfulQA: A benchmark designed to measure whether language models generate truthful answers and avoid reproducing common human falsehoods

GSM8K: Grade School Math 8K—a dataset of high-quality grade school math word problems used to test reasoning

CoT: Chain-of-Thought—a prompting technique where models generate intermediate reasoning steps

logit: The raw, unnormalized score output by the final layer of a neural network before the softmax function is applied

virtual layer: A hypothetical layer index L (where L > N) used in DeLTa's regression model to extrapolate logit values beyond the final physical layer

R^2: Coefficient of determination—a statistical measure of how well the regression predictions approximate the real data points