VLM: Vision-Language Model—AI that connects text and images, used here to detect objects and score terrain relevance
ObjectNav: Object Goal Navigation—the task of navigating to an instance of an object category specified by text
voxel: A pixel in 3D space; a small cube representing a volume of the environment
frontier: The boundary between explored and unexplored space in a map
ADTR: Active Dual-Task Reasoning—the proposed module that selects keyframes and queries the VLM for semantics and verification
SGCP: Semantic-Geometric Coherent Planning—the proposed planning algorithm that balances high-value semantic targets with flight distance costs
3D value map: A spatial grid where each cell holds a probability score indicating how likely the target object is to be there
keyframe: A selected video frame preserved for processing because it contains new information or relevant objects
stop-and-infer: A robotic behavior where the agent must pause movement to wait for a heavy computation (like VLM inference) to finish