TY - GEN
T1 - When Epipolar Transformers Meets Implicit Neural Super-Resolution in Multi-View Stereo
AU - Song, Boyang
AU - Xiao, Jin
AU - Hu, Xiaoguang
AU - Zhang, Guofeng
AU - Shi, Jiaqi
AU - Jiang, Hao
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - Learning-based Multi-View Stereo (MVS) methods heavily rely on feature extraction and cost volume construction to bridge 2D semantics and 3D spatial associations. However, many recent studies have increasingly shifted focus away from these two critical steps, often adopting a multi-stage cascaded framework that can propagate errors from earlier stages. To this end, we introduce TTINS-MVSNet, a novel progressive refinement framework that integrates a one-stage MVS module enhanced by two types of Epipolar Transformer (ET) and subsequent depth optimization modules. Specifically, we incorporate an intra-view ET into context-aware feature extraction and an inter-view ET into visibility-aware cost aggregation. A significant innovation in our approach is the introduction of implicit neural super-resolution module, designed to recover finer details. Experimental results on benchmark datasets demonstrate that our method outperforms all previous approaches with similar structures, highlighting its effectiveness and generalization capability. The code will be available after publication.
AB - Learning-based Multi-View Stereo (MVS) methods heavily rely on feature extraction and cost volume construction to bridge 2D semantics and 3D spatial associations. However, many recent studies have increasingly shifted focus away from these two critical steps, often adopting a multi-stage cascaded framework that can propagate errors from earlier stages. To this end, we introduce TTINS-MVSNet, a novel progressive refinement framework that integrates a one-stage MVS module enhanced by two types of Epipolar Transformer (ET) and subsequent depth optimization modules. Specifically, we incorporate an intra-view ET into context-aware feature extraction and an inter-view ET into visibility-aware cost aggregation. A significant innovation in our approach is the introduction of implicit neural super-resolution module, designed to recover finer details. Experimental results on benchmark datasets demonstrate that our method outperforms all previous approaches with similar structures, highlighting its effectiveness and generalization capability. The code will be available after publication.
KW - deep learning
KW - epipolar transformer
KW - implicit neural representation
KW - multi-view stereo
KW - three-dimensional reconstruction
UR - https://www.scopus.com/pages/publications/105022602083
U2 - 10.1109/ICME59968.2025.11210006
DO - 10.1109/ICME59968.2025.11210006
M3 - 会议稿件
AN - SCOPUS:105022602083
T3 - Proceedings - IEEE International Conference on Multimedia and Expo
BT - 2025 IEEE International Conference on Multimedia and Expo
PB - IEEE Computer Society
T2 - 2025 IEEE International Conference on Multimedia and Expo, ICME 2025
Y2 - 30 June 2025 through 4 July 2025
ER -