RLPD: Reinforcement Learning with Prior Data—an off-policy algorithm that aggressively samples from a static dataset of demonstrations while learning
HIL-SERL: Human-in-the-Loop Sample-Efficient Robotic Reinforcement Learning—the system proposed in this paper
impedance control: A control strategy that manages the relationship between force and position, allowing the robot to be compliant (soft) when touching objects to prevent damage
proprioceptive state: The robot's internal sense of its own body position (e.g., joint angles, end-effector coordinates)
sparse reward: A reward signal that is only given upon successful completion of a task (e.g., +1 for success, 0 otherwise), as opposed to dense shaping rewards
off-policy RL: Reinforcement learning where the algorithm learns from data collected by a different policy (e.g., past data or human demonstrations) rather than only the current policy
ResNet: Residual Network—a deep convolutional neural network architecture widely used for image recognition
DQN: Deep Q-Network—an RL algorithm that combines Q-learning with deep neural networks to handle high-dimensional state spaces