Bronchoscopy: A medical procedure to look inside the lungs' airways using a flexible tube with a camera
Electromagnetic tracking (EM): A common localization method using magnetic fields to track instrument position, often used as a ground truth or auxiliary system in surgery
Sim-to-real: Transferring a robot policy trained in simulation (or controlled environments) to the real world
Visual Servoing: Controlling a robot's motion using feedback from a vision sensor to align with a target image
LPIPS: Learned Perceptual Image Patch Similarity—a metric used to measure how similar two images appear to a human, often better than pixel-wise differences
SSIM: Structural Similarity Index Measure—a metric for measuring image quality/similarity based on luminance, contrast, and structure
CBCT: Cone Beam CT—a type of X-ray imaging used intraoperatively to verify the robot's position inside the body
World Model: A predictive model that simulates how the environment (next video frame) will change in response to a robot's action
EfficientNet-B0: A specific convolutional neural network architecture optimized for efficiency and accuracy, used here as the visual backbone
Transformer: A neural network architecture based on attention mechanisms, used here to process sequences of visual features