TY - JOUR
T1 - Learnable patchmatch and self-teaching for multi-frame depth estimation in monocular endoscopy
AU - Shao, Shuwei
AU - Pei, Zhongcai
AU - Chen, Weihai
AU - Wu, Xingming
AU - Liu, Zhong
N1 - Publisher Copyright:
© 2025 Elsevier Ltd
PY - 2025/12/20
Y1 - 2025/12/20
N2 - This work delves into unsupervised monocular depth estimation in endoscopy, which leverages adjacent frames to establish supervisory signals during the training phase. For many clinical applications, e.g., surgical navigation, temporally correlated frames are also available at test time. However, most existing monocular methods struggle to make effective use of temporal information during both training and inference, primarily due to the inherent challenges of endoscopic imagery, including low- or homogeneous-texture regions and brightness fluctuations between frames. To fully exploit the temporal information in endoscopic scenes, we propose a novel unsupervised multi-frame monocular depth estimation model. The proposed model integrates a learnable patchmatch module to adaptively increase the discriminative ability in regions with low or homogeneous textures, and enforces cross-teaching and self-teaching consistencies to provide efficacious regularizations towards brightness fluctuations. Furthermore, as a byproduct of the self-teaching paradigm, the proposed model is able to improve the depth predictions when more frames are input at test time. We conduct detailed experiments on multiple datasets, and the experimental results indicate that the proposed method exceeds prior state-of-the-art competitors. The source code and trained models will be publicly available at https://github.com/ShuweiShao/FrameDepth.
AB - This work delves into unsupervised monocular depth estimation in endoscopy, which leverages adjacent frames to establish supervisory signals during the training phase. For many clinical applications, e.g., surgical navigation, temporally correlated frames are also available at test time. However, most existing monocular methods struggle to make effective use of temporal information during both training and inference, primarily due to the inherent challenges of endoscopic imagery, including low- or homogeneous-texture regions and brightness fluctuations between frames. To fully exploit the temporal information in endoscopic scenes, we propose a novel unsupervised multi-frame monocular depth estimation model. The proposed model integrates a learnable patchmatch module to adaptively increase the discriminative ability in regions with low or homogeneous textures, and enforces cross-teaching and self-teaching consistencies to provide efficacious regularizations towards brightness fluctuations. Furthermore, as a byproduct of the self-teaching paradigm, the proposed model is able to improve the depth predictions when more frames are input at test time. We conduct detailed experiments on multiple datasets, and the experimental results indicate that the proposed method exceeds prior state-of-the-art competitors. The source code and trained models will be publicly available at https://github.com/ShuweiShao/FrameDepth.
KW - Endoscopy
KW - Multi-frame depth estimation
KW - Unsupervised learning
UR - https://www.scopus.com/pages/publications/105017421597
U2 - 10.1016/j.engappai.2025.112463
DO - 10.1016/j.engappai.2025.112463
M3 - 文章
AN - SCOPUS:105017421597
SN - 0952-1976
VL - 162
JO - Engineering Applications of Artificial Intelligence
JF - Engineering Applications of Artificial Intelligence
M1 - 112463
ER -