Reasoning Process Graph: A graph where nodes are LLM thoughts and edges are dependencies; nodes are classified to determine if reasoning should continue, stop, or backtrack
GNN: Graph Neural Network—a deep learning model that processes graph-structured data to produce node embeddings
PPO: Proximal Policy Optimization—a reinforcement learning algorithm used here to train the GNN to select better reasoning modes
Actor-Critic: An RL architecture where the Actor decides actions (reasoning parameters) and the Critic estimates the value of the current graph state
Chain-of-Thought (CoT): A prompting technique where the model generates intermediate reasoning steps
Tree of Thoughts (ToT): A framework allowing LLMs to explore multiple reasoning paths in a tree structure
Graph of Thoughts (GoT): A framework modeling reasoning as an arbitrary graph
Temperature: An LLM hyperparameter controlling randomness in generation
Branching factor: The number of new thought nodes generated from a single parent node