← Back to Paper List

3D Reconstruction with Spatial Memory

Hengyi Wang, Lourdes Agapito
Department of Computer Science, University College London
International Conference on 3D Vision (2024)
Memory MM

📝 Paper Summary

Dense 3D Reconstruction Incremental/Online Reconstruction Memory-augmented Neural Networks
Spann3R converts a pairwise 3D reconstruction model into a real-time incremental system by using a spatial memory to preserve global geometry across frames without optimization.
Core Problem
State-of-the-art dense reconstruction methods like DUSt3R operate on image pairs and require slow, offline global optimization to align predictions, preventing real-time or incremental use.
Why it matters:
  • Traditional pipelines (SfM/SLAM) are brittle and complex, requiring separate steps for matching, triangulation, and bundle adjustment.
  • Current deep learning alternatives (DUSt3R) are robust but computationally heavy and non-sequential, limiting applications in robotics or AR that need on-the-fly geometry.
Concrete Example: When DUSt3R reconstructs a video, it treats every image pair independently in local coordinates, then runs a global optimization (bundle adjustment) to align them. This takes minutes. Spann3R tracks geometry in memory, predicting the next frame correctly aligned immediately.
Key Novelty
Spann3R (Spatial Memory for 3D Reconstruction)
  • Maintains an external 'Spatial Memory' that stores geometric features from previous frames, acting as a global coordinate reference.
  • Uses a transformer-based query mechanism to retrieve relevant past 3D information for the current frame, aligning it 'on-the-fly' like a spanner tightening bolts.
  • Separates memory into 'Working Memory' (recent frames, dense) and 'Long-term Memory' (consolidated/sparsified), mimicking human memory models to stay efficient.
Architecture
Architecture Figure Figure 3
The Spann3R inference pipeline showing how images are processed into pointmaps using spatial memory.
Evaluation Highlights
  • Achieves real-time online incremental reconstruction at over 50 frames per second (fps) without test-time optimization.
  • Demonstrates competitive reconstruction quality on unseen datasets (7Scenes, NRGBD, DTU) compared to offline optimization-based methods like FrozenRecon and DUSt3R.
  • Successfully processes both ordered video sequences and unordered image collections (via graph-based ordering).
Breakthrough Assessment
8/10
Significant architectural leap: successfully converts a pairwise, offline foundational model (DUSt3R) into a real-time, sequential system via memory mechanisms, maintaining robustness while gaining speed.
×