跳到主要导航 跳到搜索 跳到主要内容

Multimodal Monocular Dense Depth Estimation with Event-Frame Fusion Using Transformer

  • Baihui Xiao
  • , Jingzehua Xu
  • , Zekai Zhang
  • , Tianyu Xing
  • , Jingjing Wang*
  • , Yong Ren
  • *此作品的通讯作者
  • Tsinghua University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Frame cameras struggle to estimate depth maps accurately under abnormal lighting conditions. In contrast, event cameras, with their high temporal resolution and high dynamic range, can capture sparse, asynchronous event streams that record pixel brightness changes, addressing the limitations of frame cameras. However, the potential of asynchronous events remains underexploited, which hinders the ability of event cameras to predict dense depth maps effectively. Integrating event streams with frame data can significantly enhance the monocular depth estimation accuracy, especially in complex scenarios. In this study, we introduce a novel depth estimation framework that combines event and frame data using a transformer-based model. Our proposed framework contains two primary components: a multimodal encoder and a joint decoder. The multimodal encoder employs self-attention mechanisms to analyze the interactions between frame patches and event tensors, mapping out dependencies across local and global spatiotemporal events. This multi-scale fusion approach maximizes the benefits of both event and frame inputs. The joint decoder incorporates a dual-phase, triple-scale feature fusion module, which extracts contextual information and delivers detailed depth prediction results. Our experimental results on the EventScape and MVSEC datasets affirm that our method sets a new benchmark in performance.

源语言英语
主期刊名Artificial Neural Networks and Machine Learning – ICANN 2024 - 33rd International Conference on Artificial Neural Networks, Proceedings
编辑Michael Wand, Jürgen Schmidhuber, Michael Wand, Kristína Malinovská, Jürgen Schmidhuber, Igor V. Tetko, Igor V. Tetko
出版商Springer Science and Business Media Deutschland GmbH
419-433
页数15
ISBN(印刷版)9783031723346
DOI
出版状态已出版 - 2024
活动33rd International Conference on Artificial Neural Networks, ICANN 2024 - Lugano, 瑞士
期限: 17 9月 202420 9月 2024

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
15017 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议33rd International Conference on Artificial Neural Networks, ICANN 2024
国家/地区瑞士
Lugano
时期17/09/2420/09/24

指纹

探究 'Multimodal Monocular Dense Depth Estimation with Event-Frame Fusion Using Transformer' 的科研主题。它们共同构成独一无二的指纹。

引用此