Skip to main navigation Skip to search Skip to main content

Windowed bundle adjustment framework for unsupervised learning of monocular depth estimation with U-Net extension and clip loss

  • Lipu Zhou*
  • , Michael Kaess
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

This letter presents a self-supervised framework for learning depth from monocular videos. In particular, the main contributions of this letter include: (1) We present a windowed bundle adjustment framework to train the network. Compared to most previous works that only consider constraints from consecutive frames, our framework increases the camera baseline and introduces more constraints to avoid overfitting. (2) We extend the widely used U-Net architecture by applying a Spatial Pyramid Net (SPN) and a Super Resolution Net (SRN). The SPN fuses information from an image spatial pyramid for the depth estimation, which addresses the context information attenuation problem of the original U-Net. The SRN learns to estimate a high resolution depth map from a low resolution image, which can benefit the recovery of details. (3) We adopt a clip loss function to handle moving objects and occlusions that were solved by designing complicated network or requiring extra information (such as segmentation mask [1]) in previous works. Experimental results show that our algorithm provides state-of-the-art results on the KITTI benchmark.

Original languageEnglish
Article number9013050
Pages (from-to)3283-3290
Number of pages8
JournalIEEE Robotics and Automation Letters
Volume5
Issue number2
DOIs
StatePublished - Apr 2020
Externally publishedYes

Keywords

  • Monocular depth estimation
  • visual odometry

Fingerprint

Dive into the research topics of 'Windowed bundle adjustment framework for unsupervised learning of monocular depth estimation with U-Net extension and clip loss'. Together they form a unique fingerprint.

Cite this