← Back to Paper List

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

Dongyan An, H. Wang, Wenguan Wang, Zun Wang, Yan Huang, Keji He, Liang Wang
University of Chinese Academy of Sciences, Beijing Institute of Technology, Zhejiang University, Australian National University
IEEE Transactions on Pattern Analysis and Machine Intelligence (2023)
MM Agent Memory Reasoning

📝 Paper Summary

Vision-Language Navigation (VLN) Continuous Control Topological Mapping
ETPNav enables robust continuous navigation by building an online topological map from depth-predicted waypoints for global planning, paired with a trial-and-error controller to escape obstacles.
Core Problem
Continuous navigation lacks the pre-defined graphs of discrete VLN, making long-range planning difficult, while standard controllers frequently get stuck in obstacles when sliding is forbidden.
Why it matters:
  • Directly predicting low-level actions is brittle for long-horizon tasks, leading to poor success rates compared to discrete navigation
  • Local waypoint approaches lack global context for backtracking or correcting errors
  • Real-world robots and challenging simulators (RxR-CE) forbid 'sliding' along walls, causing navigation failures if the controller cannot handle collisions
Concrete Example: In a 'sliding-forbidden' environment, if an agent grazes a table while moving forward, a standard controller halts and fails. ETPNav's controller detects the deadlock, rotates to find a clear path, and resumes, preventing episode failure.
Key Novelty
Online Evolving Topological Planning with Obstacle Recovery
  • Constructs a topological graph on-the-fly by predicting 'ghost nodes' (reachable but unvisited waypoints) from depth images, organizing them into a map without prior environment knowledge
  • Decouples navigation into high-level graph planning (selecting a remote ghost node) and low-level control (robustly reaching it)
  • Introduces a 'Tryout' heuristic controller that actively detects collision deadlocks and rotates to escape them
Architecture
Architecture Figure Figure 1
The hierarchical framework of ETPNav, showing the interaction between Mapping, Planning, and Control modules during an episode.
Evaluation Highlights
  • +13% Success Rate (SR) improvement over RecBERT on R2R-CE Val-Unseen split
  • +25.99% SR improvement over RecBERT on the challenging RxR-CE Val-Unseen split (where sliding is forbidden)
  • System based on this algorithm won the CVPR 2022 RxR-Habitat Challenge, doubling the SDTW score of the second-best model
Breakthrough Assessment
8/10
Significant performance jump on the harder RxR-CE benchmark. Effectively bridges the gap between discrete graph-based planning and continuous control via online mapping.
×