SARM: Stage-Aware Reward Modeling—the proposed framework for estimating task progress using hierarchical stage and subtask predictions
RA-BC: Reward-Aligned Behavior Cloning—a training method that weights imitation learning samples based on their estimated progress/reward
RBM: Robot Behavior Models—general-purpose policies that integrate perception and control for robotic tasks
CLIP: Contrastive Language-Image Pre-training—a model used here to encode visual observations into embeddings
Deformable Object: Objects like fabric or clothes that change shape when manipulated, making state estimation and planning difficult
Behavior Cloning (BC): A supervised learning approach where a policy is trained to minimize the error between its predicted actions and expert demonstrations
Welford's Algorithm: A numerically stable method for computing running mean and variance, used here to normalize reward weights online
VLM: Vision-Language Model—models that process both images and text, often used as baselines for reward estimation
Subtask: A semantic segment of a long-horizon task (e.g., 'grasp left sleeve'), used to ground progress labels