RCSL: Return-Conditioned Supervised Learning—a paradigm where policies are trained to generate actions conditional on a specified future return (e.g., Decision Transformer)
Off-Dynamics RL: Reinforcement learning where the training environment (source) has different transition dynamics than the deployment environment (target), but the same reward function
DARA: Dynamics-Aware Reward Augmentation—a prior method for dynamic programming RL that modifies rewards to account for dynamics shifts by matching trajectory distributions
DT: Decision Transformer—an offline RL algorithm that models RL as a sequence modeling problem, predicting actions given states and desired returns
REAG: Return Augmented DT—the proposed method that transforms source domain returns to match target domain statistics
Laplace approximation: A technique to approximate a probability distribution with a Gaussian centered at its mode; used here to model return distributions
Sim-to-Real gap: The difference in performance or behavior when transferring a policy from a simulation (source) to the real world (target) due to imperfect modeling