BC: Behavior Cloning—supervised learning where a policy is trained to mimic expert demonstrations.
PPO: Proximal Policy Optimization—an on-policy reinforcement learning algorithm that stabilizes training by limiting how much the policy can change at each step.
Sparse Reward: A reward signal given only upon task completion (e.g., +1 for success, 0 otherwise), as opposed to dense rewards that give feedback at every step.
Sim-to-Real: The process of transferring a policy trained in a physics simulator to a physical robot, often requiring domain randomization to handle visual/physical discrepancies.
KV-cache: Key-Value cache—a technique to speed up transformer inference by storing and reusing calculations for previous tokens.
SPOC: The specific multi-task mobile manipulation foundation model (based on a transformer architecture) that FLaRe uses as its starting point.
DinoV2: A computer vision foundation model used to extract robust visual features that generalize well between simulation and reality.
ProcTHOR: A framework for procedurally generating diverse simulated 3D environments (houses) for robot training.
Actor-Critic: An RL architecture with two networks: an Actor (decides actions) and a Critic (estimates value of states).
On-policy: RL algorithms (like PPO) that learn strictly from data collected by the current version of the policy, ensuring stability but lower sample efficiency.