PPO: Proximal Policy Optimization—an RL algorithm that improves policies using a clipped surrogate objective to prevent destructively large updates
Diffusion Policy: A robot control policy that generates actions by iteratively denoising random noise, conditioned on observations
Consistency Model: A generative model distilled from diffusion that can generate samples (actions) in a single step, drastically reducing inference latency
DDIM: Denoising Diffusion Implicit Models—a sampling method for diffusion models that skips steps to speed up generation (but is still slower than Consistency Models)
Action Chunking: Predicting a sequence of k future actions at once rather than just the next immediate action, used to ensure temporal smoothness
OPE: Offline Policy Evaluation—methods to estimate the performance of a policy using historical data without running it on the real robot
Sim-to-real: Transferring policies trained in simulation to the real world; this paper focuses on Real-to-Real (training directly on hardware)
MDP: Markov Decision Process—the mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker