PAC: Perceiver-Actor-Critic—the proposed neural architecture adapting Perceiver-IO for scalable actor-critic RL
BC: Behavioral Cloning—supervised learning that mimics the actions in a dataset
MPO: Maximum a Posteriori Policy Optimisation—an RL algorithm that frames policy updates as weighted supervised learning
Offline RL: Reinforcement learning using a fixed dataset without interacting with the environment during training
Perceiver-IO: A transformer architecture that maps high-dimensional inputs to a smaller latent array via cross-attention to reduce computational complexity
KL divergence: Kullback-Leibler divergence—a measure of how one probability distribution differs from a second, reference probability distribution
Proprioception: Sensing the position, movement, and orientation of the robot's own body parts
Scaling laws: Empirical power-law relationships between model size, dataset size, compute budget, and performance