โ† Back to Paper List

Improving reasoning at inference time via uncertainty minimisation

Nicolas Legrand, Kenneth Enevoldsen, Mรกrton Kardos, Kristoffer Nielbo
Center for Humanities Computing, Aarhus University, Denmark
arXiv (2026)
Reasoning Factuality

๐Ÿ“ Paper Summary

Inference-time scaling Reasoning Uncertainty estimation
A verifier-free inference method that selects intermediate reasoning steps by maximizing the model's internal self-certainty, improving performance by stabilizing early planning without external supervision.
Core Problem
Existing inference-time scaling methods are computationally expensive (requiring full rollouts) or unreliable (token-level uncertainty is noisy), while external verifiers require costly training.
Why it matters:
  • Token-level metrics often conflate epistemic and aleatoric uncertainty, leading to confident but incorrect hallucinations
  • Full-chain sampling (e.g., Best-of-N) wastes compute on dead-end paths that could be pruned earlier
  • Reasoning requires dynamic uncertainty resolution (planning), which static decoding strategies fail to capture
Concrete Example: When solving a math problem, a model might transiently increase uncertainty while formulating a plan. Token-level greedy decoding might pick a high-probability generic phrase that leads to a dead end, whereas maximizing 'thought-level' self-certainty selects the specific sub-derivation the model is most committed to internally.
Key Novelty
Thought-Level Self-Certainty Maximization
  • Shift the unit of analysis from tokens to 'thoughts' (intermediate reasoning steps defined by delimiters) to capture semantic coherence
  • Select the next reasoning step from k samples by maximizing the average KL divergence between the predictive distribution and a uniform distribution (self-certainty)
  • Use internal signals exclusively, removing the need for trained verifiers or external reward models
Evaluation Highlights
  • Up to 4x accuracy improvement on Danish GSM8K using Qwen-1.5B compared to greedy decoding
  • Matches or exceeds Self-Consistency (Majority Voting) baselines on MATH500 and GSM8K under comparable token budgets
  • Sampling only during the first 1โ€“5 reasoning steps achieves peak accuracy, outperforming sampling at every step (inverted U-shape performance)
Breakthrough Assessment
7/10
Offers a principled, compute-efficient alternative to Majority Voting that relies purely on internal signals. The finding that early-step uncertainty minimization drives performance is a significant insight into LLM reasoning dynamics.
×