GFlowNet: Generative Flow Network—a probabilistic model that learns to sample discrete objects (like molecules) with probability proportional to a reward
DAG: Directed Acyclic Graph—a structure where edges go in one direction without loops; used here to represent the step-by-step construction of objects
Soft RL: Entropy-Regularized Reinforcement Learning (also MaxEnt RL)—an RL variant that maximizes both reward and the entropy (randomness) of the policy
SoftDQN: An algorithm for Soft RL that learns Q-values (expected future rewards + entropy) to approximate the optimal soft policy
Munchausen DQN: M-DQN—An RL algorithm that augments rewards with a KL-divergence penalty relative to the current policy, equivalent to a form of Soft RL
Trajectory Balance: TB—A specific loss function for training GFlowNets that ensures probability flow is conserved along complete trajectories
Detailed Balance: DB—A GFlowNet loss function ensuring flow consistency across individual edges (parents to children and vice versa)
Bellman Equation: A recursive equation in RL that relates the value of a state to the expected value of the next state
Markovian Flow: A flow where the probability of moving to the next state depends only on the current state, not the history
Q-value: The expected long-term return (reward) of taking a specific action in a specific state