VLA: Vision-Language-Action model—a foundation model that takes images and text as input and outputs robot actions as text tokens
PPO: Proximal Policy Optimization—an RL algorithm that optimizes policies by taking small, stable update steps constrained by a clipping mechanism
RPRM: Robotic Process Reward Model—a vision-language model trained to predict the probability of future task success, providing dense rewards
LoRA: Low-Rank Adaptation—a parameter-efficient fine-tuning technique that updates only a small subset of model weights
OOD: Out-of-Distribution—scenarios or data points that differ significantly from the training data
SigLIP: A specific vision encoder model used to process visual inputs
DinoV2: A self-supervised vision model used for extracting visual features
GAE: Generalized Advantage Estimation—a method to estimate the advantage function in RL to reduce variance
LIBERO: A benchmark suite for lifelong robot learning with diverse manipulation tasks
FSDP: Fully Sharded Data Parallel—a distributed training technique to handle large models across multiple GPUs