World Model: A learned model that predicts the future states of an environment (e.g., future images) conditioned on current states and actions
Proprioception: Sensing the internal state of the robot, such as joint angles, velocities, and torque
Cosmos Tokenizer: A vision autoencoder from NVIDIA specialized for manipulator images, used here to compress images into latent embeddings
Conformal Prediction: A statistical framework that uses past data to determine thresholds for uncertainty scores, providing guaranteed error rates (e.g., false alarm rate)
Non-conformity score: A scalar value quantifying how different a new observation is from the training (nominal) distribution
Latent Space: A compressed vector representation of data (like images) where similar items are closer together, simplifying complex processing