Effective Horizon: A measure of MDP complexity roughly corresponding to the lookahead depth required to identify optimal actions using random rollouts at the leaves
BRIDGE: A new dataset of 155 deterministic MDPs (from Atari, Procgen, MiniGrid) with full tabular representations for exact theoretical analysis
GORP: Greedy Over Random Policy—a simple algorithm that estimates Q-values via random rollouts and acts greedily, used to define the effective horizon
Sample Complexity: The minimum number of timesteps needed for an algorithm to return an optimal policy with probability at least 1/2
Covering Length: The number of episodes needed to visit all state-action pairs at least once with probability 1/2 using random actions
PPO: Proximal Policy Optimization—a popular policy gradient Deep RL algorithm
DQN: Deep Q-Network—a popular value-based Deep RL algorithm
k-QVI-solvable: A property of an MDP where applying k steps of Value Iteration to the random policy's Q-function yields a greedy policy that is optimal
Effective Planning Window: A theoretical window W < T such that planning only W steps ahead is sufficient to act optimally