Skip to main navigation Skip to search Skip to main content

When Epipolar Transformers Meets Implicit Neural Super-Resolution in Multi-View Stereo

  • Boyang Song
  • , Jin Xiao*
  • , Xiaoguang Hu
  • , Guofeng Zhang
  • , Jiaqi Shi
  • , Hao Jiang
  • *Corresponding author for this work
  • Beihang University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Learning-based Multi-View Stereo (MVS) methods heavily rely on feature extraction and cost volume construction to bridge 2D semantics and 3D spatial associations. However, many recent studies have increasingly shifted focus away from these two critical steps, often adopting a multi-stage cascaded framework that can propagate errors from earlier stages. To this end, we introduce TTINS-MVSNet, a novel progressive refinement framework that integrates a one-stage MVS module enhanced by two types of Epipolar Transformer (ET) and subsequent depth optimization modules. Specifically, we incorporate an intra-view ET into context-aware feature extraction and an inter-view ET into visibility-aware cost aggregation. A significant innovation in our approach is the introduction of implicit neural super-resolution module, designed to recover finer details. Experimental results on benchmark datasets demonstrate that our method outperforms all previous approaches with similar structures, highlighting its effectiveness and generalization capability. The code will be available after publication.

Original languageEnglish
Title of host publication2025 IEEE International Conference on Multimedia and Expo
Subtitle of host publicationJourney to the Center of Machine Imagination, ICME 2025 - Conference Proceedings
PublisherIEEE Computer Society
ISBN (Electronic)9798331594954
DOIs
StatePublished - 2025
Event2025 IEEE International Conference on Multimedia and Expo, ICME 2025 - Nantes, France
Duration: 30 Jun 20254 Jul 2025

Publication series

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (Electronic)1945-788X

Conference

Conference2025 IEEE International Conference on Multimedia and Expo, ICME 2025
Country/TerritoryFrance
CityNantes
Period30/06/254/07/25

Keywords

  • deep learning
  • epipolar transformer
  • implicit neural representation
  • multi-view stereo
  • three-dimensional reconstruction

Fingerprint

Dive into the research topics of 'When Epipolar Transformers Meets Implicit Neural Super-Resolution in Multi-View Stereo'. Together they form a unique fingerprint.

Cite this