Value Map: A 3D voxel grid where each voxel contains a scalar value representing the cost or reward of the robot's end-effector being at that location.
Affordance Map: A type of value map indicating regions the robot should interact with or move towards (e.g., a handle).
Constraint Map: A type of value map indicating regions the robot should avoid (e.g., an obstacle like a vase).
MPC: Model Predictive Control—an optimal control method that optimizes a trajectory over a finite time horizon using a dynamic model.
LMP: Language Model Program—a modular prompting structure where LLMs generate code to solve sub-tasks, recursively calling other LMPs.
OWL-ViT: Open-World Localization Vision Transformer—an open-vocabulary object detection model.
SAM: Segment Anything Model—a model that can generate segmentation masks for objects given prompts like bounding boxes.
Voxel: A volume element; essentially a 3D pixel representing a point in a 3D grid.
Zero-shot: The ability to perform a task without having explicitly trained on examples of that specific task.
6-DoF: Six Degrees of Freedom—referring to position (x, y, z) and orientation (roll, pitch, yaw).