跳到主要导航 跳到搜索 跳到主要内容

Fusion-Mamba for Cross-Modality Object Detection

  • Beihang University
  • East China Normal University
  • Key Laboratory of Precision Opto-Mechatronics Technology (Ministry of Education)
  • Tencent
  • Eastern Institute of Technology, Ningbo
  • Zhongguancun Laboratory
  • Nanchang Institute of Technology

科研成果: 期刊稿件文章同行评审

摘要

Cross-modality object detection aims to fuse complementary information from different modalities to improve model performance, which achieves a wider range of applications. However, traditional cross-modality fusion methods, based on CNN or Transformer, inadequately address the issue of pseudo-target information, which causes model attention dispersion to degrade object detection performance. In this paper, we investigate a novel cross-modality fusion approach by associating cross-modal features in a hidden state space based on an improved Mamba with a gating attention mechanism. We propose the Fusion-Mamba Block(FMB), designed to map cross-modal features into a hidden state space for interaction, thereby refining the model’s attention on true target areas and enhancing overall performance. The FMB comprises two key modules: State Space Channel Swapping (SSCS) module, which facilitates the fusion of shallow features, and Dual State Space Fusion (DSSF) module, which enables deep fusion and effectively suppresses pseudo-target information within the hidden state space. Our proposed method outperforms state-of-the-art approaches, achieving improvements of 5.9%, 3.5% and 2.1% mAP on M3 FD, DroneVehicle and FLIR-Aligned, respectively. To the best of our knowledge, this work establishes a new baseline for cross-modality object detection, providing a robust foundation for future research in this area.

源语言英语
页(从-至)7392-7406
页数15
期刊IEEE Transactions on Multimedia
27
DOI
出版状态已出版 - 2025

指纹

探究 'Fusion-Mamba for Cross-Modality Object Detection' 的科研主题。它们共同构成独一无二的指纹。

引用此