MGDM: Multi-Granularity Diffusion Modeling—the proposed method that reweights training loss based on token difficulty to focus on hard subgoals
Subgoal Imbalance: The phenomenon where different steps (tokens) in a generation task vary significantly in difficulty, with some requiring much longer-term planning than others
Planning Distance (PD): A metric quantifying the difficulty of a subgoal, defined as the number of steps the model must look ahead to make a correct decision
Discrete Diffusion: A generative model that gradually corrupts discrete data (tokens) with noise (masking or randomizing) and learns to reverse this process to generate data
Easy-first Decoding: An inference strategy where the model commits to high-confidence (easy) tokens first and uses them as context to solve lower-confidence (hard) tokens later
Autoregressive (AR): Models that generate sequences one token at a time from left to right, conditioning each token only on previously generated ones
SAT: Boolean Satisfiability Problem—an NP-complete problem of determining if there exists an interpretation that satisfies a given Boolean formula