Abstract
Given a set of calibrated images, Multiple View Stereo (MVS) applies end-to-end depth inference network to recover scene structure. However, previous methods designed pixel-visibility modules to aggregate cross-view cost, ignoring the consistency assumption of 2D contextual features in the 3D depth direction. The current multi-stage depth inference model also relies on intensive depth samples, which requires high memory consumption. To alleviate these problems, this work exploits edge-assisted epipolar Transformer for multi-view depth inference. The improvements of this work are summarized as follows: 1) The epipolar Transformer block is developed for reliable cross-view cost aggregation, and the edge detection branch is designed to constrain the consistency of epipolar geometry and edge features. 2) The dynamic depth range sampling mechanism based on probability volume is adopted to improve the accuracy of uncertain areas. Comprehensive comparisons with the state-of-the-art works indicate that our work can reconstruct dense scene representations with limited memory bottleblock.
| Original language | English |
|---|---|
| Pages (from-to) | 701-711 |
| Number of pages | 11 |
| Journal | IEEE Transactions on Automation Science and Engineering |
| Volume | 22 |
| DOIs | |
| State | Published - 2025 |
Keywords
- MVS
- cost aggregation
- depth inference
- epipolar transformer
Fingerprint
Dive into the research topics of 'Edge-Assisted Epipolar Transformer for Industrial Scene Reconstruction'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver