RESfM: Robust Deep Equivariant Structure from Motion

1Weizmann Institute of Science, 2NVIDIA Research
ICLR 2025
The updated version will be posted soon.

Abstract

Multiview Structure from Motion is a fundamental and challenging computer vision problem. A recent deep-based approach utilized matrix equivariant architectures for simultaneous recovery of camera pose and 3D scene structure from large image collections.

That work, however, made the unrealistic assumption that the point tracks given as input are almost clean of outliers. Here, we propose an architecture suited to dealing with outliers by adding a multiview inlier/outlier classification module that respects the model equivariance and by utilizing a robust bundle adjustment step.

Experiments demonstrate that our method can be applied successfully in realistic settings that include large image collections and point tracks extracted with common heuristics that include many outliers, achieving state-of-the-art accuracies in almost all runs, superior to existing deep-based methods and on-par with leading classical (non-deep) sequential and global methods.

BibTeX


    @inproceedings{
    khatib2025resfm,
    title={{RES}fM: Robust Deep Equivariant Structure from Motion},
    author={Fadi Khatib and Yoni Kasten and Dror Moran and Meirav Galun and Ronen Basri},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=wldwEhQ7cl}
    }