← Back to Paper List

Can I Have Your Order? Monte-Carlo Tree Search for Slot Filling Ordering in Diffusion Language Models

JOJ Leang, Y Zhao, MCÄ Stoian, W Li, SB Cohen…
arXiv, 2/2026 (2026)
Reasoning Agent

📝 Paper Summary

Masked Diffusion Models (MDMs) Non-autoregressive generation Reasoning tasks (Code, Math)
McDiffuSE improves Masked Diffusion Models by treating the order of text generation as a decision-making problem, using Monte Carlo Tree Search to find optimal non-sequential slot orderings.
Core Problem
Masked Diffusion Models often fail on reasoning tasks because generating tokens simultaneously or in poor orders breaks logical dependencies, leading to incoherent outputs.
Why it matters:
  • Current diffusion models underperform compared to autoregressive models on complex reasoning tasks due to sensitivity to generation order.
  • Heuristic planners (like confidence-based selection) fail to account for long-range dependencies, propagating errors across iterations.
  • Closing this gap allows diffusion models to leverage their efficiency benefits without sacrificing accuracy on math and code generation.
Concrete Example: In generating a Python function, a standard model might try to generate the function body before the definition header. McDiffuSE identifies that generating the syntax declaration first conditions the remaining slots correctly, enforcing structure.
Key Novelty
Monte Carlo Diffusion Search (McDiffuSE)
  • Formulates slot selection as a Markov Decision Process where actions are choices of which masked segment (slot) to generate next.
  • Uses Monte Carlo Tree Search with a 'lookahead' rollout mechanism to estimate the long-term coherence of a generation order before committing to it.
  • Integrates model confidence as both a prior for search expansion and a reward signal, effectively performing inference-time planning without auxiliary training.
Architecture
Architecture Figure Figure 1
Conceptual illustration of McDiffuSE solving a Python code generation task. It contrasts a standard sequential plan with an MCTS-optimized plan.
Evaluation Highlights
  • +19.5% absolute accuracy improvement on MBPP (code generation) compared to baseline plan-and-infill methods.
  • +4.9% accuracy gain on MATH500 compared to state-of-the-art plan-and-infill method ReFusion.
  • Matches or exceeds autoregressive model performance on 5 out of 6 reasoning benchmarks under identical experimental conditions.
Breakthrough Assessment
8/10
Significantly narrows the performance gap between diffusion and autoregressive models on reasoning tasks, demonstrating that search-based planning is a viable path for non-autoregressive generation.
×