MAV: Multi-Action-Value model—a Transformer trained to be a policy, value function, and world model simultaneously.
External Search: Using the LLM to guide a traditional search algorithm (like MCTS), where the LLM provides priors and values.
Internal Search: Training the LLM to generate a linearized text representation of a search tree in its context window to select the best move.
PUCT: Predictor + Upper Confidence Bound applied to Trees—a standard algorithm for selecting nodes during MCTS.
Virtual Counts: A technique in parallel MCTS to temporarily increase visit counts of nodes being evaluated to encourage diversity in simultaneous simulations.
Centipawn: A unit of measure used in chess engines to evaluate the advantage of one side (100 centipawns = 1 pawn).
FEN: Forsyth-Edwards Notation—a standard text format for describing a particular board position of a chess game.
Elo: A rating system used to calculate the relative skill levels of players in zero-sum games.
Linearized Tree: Representing the branching structure of a search tree as a flat sequence of tokens (e.g., depth-first traversal) for LLM training.