ReST: Reinforced Self-Training—an algorithm where a model improves by generating its own data, filtering for high-quality samples (based on a reward), and retraining on them
Housekeep: A 3D simulated benchmark for household robotic agents focused on tidying and object rearrangement tasks
Scene Graph: A structured representation of the environment (rooms, receptacles, objects) that updates dynamically as the robot explores
IL: Imitation Learning—training an agent to mimic the behavior of an expert demonstrator
Affordance: The possibility of an action on an object or environment (e.g., a cup 'affords' being picked up)
NLL: Negative Log Likelihood—a loss function used in supervised fine-tuning to maximize the probability of the target tokens
DPO: Direct Preference Optimization—an alternative alignment method requiring paired positive/negative samples, which the authors avoid due to API limitations