跳到主要导航 跳到搜索 跳到主要内容

STEM-DETR: multimodal remote sensing object detection based on improved spatial-temporal feature enhancement

  • Beihang University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Object detection remains a key task in remote sensing image processing. When applying the task to multimodal and sequential images, the problem of fusion of reciprocal information from multimodal sources, as well as the problem of temporal information extraction from sequential images, remain difficult yet rewarding. Current object detection networks mostly cannot process images that are both multimodal and sequential. The processed result also suffers from the variant object sizes found in remote sensing images, the unalignment and redundancy between modalities, and the difficulty in preserving long-range temporal information. Designed to tackle these problems, this research proposes a multimodal remote sensing object detection method based on improved spatial-temporal feature enhancement. The model proposed, called Spatial-Temporal Enhanced Multimodal DETR or STEM-DETR, supports object detection on RGB-T multimodal sequential images. We iterated on the typical end-to-end object detection pipeline of DETR by designing two unique modules, namely the RGB-T mixed attention merging module and the global spatial-temporal enhancement module. The RGB-T mixed attention merging module facilitates feature-level fusion between modalities, while the global spatial-temporal enhancement module builds on the concept of object queries by filtering high-confidence ones in the temporal sequence to enhance others. To validate the effectiveness of our method, thorough ablation study and comparison experiments are conducted. Within experiments, STEM-DETR achieved a maximum of 75.3 AP50 on our custom dataset, surpassing that of YOLOV++, SuperYOLO and TransVOD. These statistics are also supported by visual representations of the model's output. The results show that our method is both effective and adaptable.

源语言英语
主期刊名AOPC 2025
主期刊副标题Optical Sensing, Imaging, Communications, Display, and Biomedical Optics
编辑Yadong Jiang
出版商SPIE
ISBN(电子版)9781510698604
DOI
出版状态已出版 - 28 10月 2025
活动AOPC 2025: Optical Sensing, Imaging, Communications, Display, and Biomedical Optics - Beijing, 中国
期限: 24 6月 202527 6月 2025

出版系列

姓名Proceedings of SPIE - The International Society for Optical Engineering
13958
ISSN(印刷版)0277-786X
ISSN(电子版)1996-756X

会议

会议AOPC 2025: Optical Sensing, Imaging, Communications, Display, and Biomedical Optics
国家/地区中国
Beijing
时期24/06/2527/06/25

指纹

探究 'STEM-DETR: multimodal remote sensing object detection based on improved spatial-temporal feature enhancement' 的科研主题。它们共同构成独一无二的指纹。

引用此