摘要
Visual tracking that combines RGB and thermal infrared modalities (RGB-T) aims to utilize the useful information of each modality to achieve more robust object localization. Most existing tracking methods based on convolutional neural networks (CNNs) and Transformers emphasize integrating multi-modal features through cross-modal attention, but ignore the potential exploitability of complementary information learned by cross-modal attention for enhancing modal features. In this paper, we propose a novel hierarchical progressive fusion network based on cross-modal attention guided enhancement for RGB-T tracking. Specifically, the complementary information generated by cross-modal attention implicitly reflects the consistent regions of interest of important information between different modalities, which is used to enhance modal features in a targeted manner. In addition, a modal feature refinement module and a fusion module are designed based on dynamic routing to perform noise suppression and adaptive integration on the enhanced multi-modal features. Extensive experiments on GTOT, RGBT234, LasHeR and VTUAV show that our method has competitive performance compared with recent state-of-the-art methods.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 276-280 |
| 页数 | 5 |
| 期刊 | IEEE Signal Processing Letters |
| 卷 | 33 |
| DOI | |
| 出版状态 | 已出版 - 11月 2025 |
指纹
探究 'Cross-Modal Attention Guided Enhanced Fusion Network for RGB-T Tracking' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver