TY - GEN
T1 - Self-Supervised Learning for Monocular Depth Estimation on Minimally Invasive Surgery Scenes
AU - Shao, Shuwei
AU - Pei, Zhongcai
AU - Chen, Weihai
AU - Zhang, Baochang
AU - Wu, Xingming
AU - Sun, Dianmin
AU - Doermann, David
N1 - Publisher Copyright:
© 2021 IEEE
PY - 2021
Y1 - 2021
N2 - Self-supervised learning algorithms that compute depth map from monocular videos have achieved remarkable performance on urban scenes and have been applied extensively. These techniques still face significant challenges, however, when applied directly to endoscopic videos because of the brightness variations from frame to frame and inadequate representation learning during the training phase. Inspired by the optical flow for motion alignment between adjacent frames, we design a AFNet with structural stability loss and residual-based smoothness loss to learn the appearance flow across adjacent frames, which handles the brightness inconsistency issue efficaciously. In addition, we propose a novel self-attention mechanism named feature scaling module to alleviate the inadequate representation learning problem. In a comparison study to the current state-of-the-art self-supervised methods explored for urban videos on the SCARED dataset, the developed model surpasses existing methods by a large margin.
AB - Self-supervised learning algorithms that compute depth map from monocular videos have achieved remarkable performance on urban scenes and have been applied extensively. These techniques still face significant challenges, however, when applied directly to endoscopic videos because of the brightness variations from frame to frame and inadequate representation learning during the training phase. Inspired by the optical flow for motion alignment between adjacent frames, we design a AFNet with structural stability loss and residual-based smoothness loss to learn the appearance flow across adjacent frames, which handles the brightness inconsistency issue efficaciously. In addition, we propose a novel self-attention mechanism named feature scaling module to alleviate the inadequate representation learning problem. In a comparison study to the current state-of-the-art self-supervised methods explored for urban videos on the SCARED dataset, the developed model surpasses existing methods by a large margin.
UR - https://www.scopus.com/pages/publications/85119884105
U2 - 10.1109/ICRA48506.2021.9561508
DO - 10.1109/ICRA48506.2021.9561508
M3 - 会议稿件
AN - SCOPUS:85119884105
T3 - Proceedings - IEEE International Conference on Robotics and Automation
SP - 7159
EP - 7165
BT - 2021 IEEE International Conference on Robotics and Automation, ICRA 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 IEEE International Conference on Robotics and Automation, ICRA 2021
Y2 - 30 May 2021 through 5 June 2021
ER -