MCTS: Monte Carlo Tree Search—a decision-making algorithm that explores possible future states to find optimal moves, widely used in games and now reasoning
UCT: Upper Confidence Bound for Trees—a formula used in MCTS to select nodes that balance high average reward (exploitation) with low visit counts (exploration)
Rollout: A simulation phase in MCTS where the model continues a reasoning path to a terminal state to estimate its value
Action Space: The set of all possible moves the model can take at a given state (e.g., decompose question, retrieve info, generate answer)
Backpropagation: In MCTS, the process of updating the statistics (value and visit count) of nodes along the path after a rollout simulation
Parallel Expansion: Evaluating multiple child nodes (actions) simultaneously rather than sequentially to speed up the search process
Retrieval Pruning: A mechanism to skip external retrieval if the model determines the current context is sufficient, saving computational cost