PA-RL: Policy-Agnostic Reinforcement Learning—the proposed method that decouples action optimization from policy training
OpenVLA: A 7-billion parameter generalist robot policy based on a Vision-Language-Action architecture
Diffusion Policy: A policy that generates actions by iteratively denoising random noise, conditioned on the state
Autoregressive Policy: A policy that generates actions token-by-token (or dimension-by-dimension) sequentially, often using Transformers
SAC: Soft Actor-Critic—a popular off-policy RL algorithm that maximizes a trade-off between expected return and entropy
Cal-QL: Calibrated Q-Learning—an offline-to-online RL algorithm that learns a conservative value function to prevent overestimation
IQL: Implicit Q-Learning—an offline RL method that avoids querying out-of-sample actions during value training
NLL: Negative Log-Likelihood—a standard supervised learning loss function minimized to make the policy outputs match the target actions
WidowX: A specific type of robotic arm used for real-world manipulation experiments in the paper
CALVIN: A simulation benchmark for long-horizon robotic manipulation tasks