Stitching: The ability to combine parts of different sub-optimal trajectories to form a new, optimal trajectory that was never explicitly seen in the dataset.
Decision Transformer (DT): An offline RL method that treats policy learning as a sequence modeling problem, predicting actions based on past states and desired future returns (Return-to-Go).
Return-to-Go (RTG): The sum of future rewards from a specific timestep to the end of the episode.
Bellman Equation: A recursive equation used in Dynamic Programming to calculate the value (Q-value) of a state-action pair based on the immediate reward and the value of the next state.
Conservative Q-Learning (CQL): An algorithm that learns a lower-bound (conservative) estimate of the value function to prevent overestimation of unseen actions in offline RL.
Behavior Cloning (BC): Supervised learning where the policy is trained to mimic the actions in the dataset exactly.
n-step Bellman: A variation of the Bellman update that looks n steps into the future before bootstrapping the value estimate, often reducing bias.