Allocentric: A spatial reference frame independent of the observer's position (e.g., a top-down map), contrasted with egocentric (first-person view)
Egocentric: A spatial reference frame relative to the observer's current viewpoint (e.g., what the camera sees)
AST: Allocentric-Spatial Tree—a directed acyclic graph representing scene objects with elliptical geometric parameters in a global coordinate system
MFM: Multimodal Foundation Model—large AI models capable of processing text, images, and potentially video (e.g., GPT-4V, Gemini)
Voxel Grid: A 3D grid representation of space where each cell (voxel) indicates occupancy or semantic class
DBSCAN: Density-Based Spatial Clustering of Applications with Noise—a clustering algorithm used here to group 3D points into distinct object instances
SAM3: Segment Anything Model 3—an advanced instance segmentation model used to identify objects in 2D images
Depth Anything V3: A monocular depth estimation model used to predict pixel-wise depth from single images