摘要
Existing state-of-the-art RGB-D saliency detection models mainly utilize the depth information as complementary cues to enhance the RGB information. However, depth maps can be easily influenced by environment and hence are full of noises. Thus, indiscriminately integrating multi-modality (i.e., RGB and depth) features may induce noise-degraded saliency maps. In this paper, we propose a novel Adaptive Fusion Network (AFNet) to solve this problem. Specifically, we design a triplet encoder network consisting of three subnetworks to process RGB, depth, and fused features, respectively. The three subnetworks are interlinked and form a grid net to facilitate mutual refinement of these multi-modality features. Moreover, we propose a Multi-modality Feature Interaction (MFI) module to exploit complementary cues between depth and RGB modalities and adaptively fuse the multi-modality features. Finally, we design the Cascaded Feature Interweaved Decoder (CFID) to exploit complementary information between multi-level features and refine them iteratively to achieve accurate saliency detection. Experimental results on six commonly used benchmark datasets verify that the proposed AFNet outperforms 20 state-of-the-art counterparts in terms of six widely adopted evaluation metrics. Source code will be publicly available athttps://github.com/clelouch/AFNet upon paper acceptance.
| 源语言 | 英语 |
|---|---|
| 页(从-至) | 152-164 |
| 页数 | 13 |
| 期刊 | Neurocomputing |
| 卷 | 522 |
| DOI | |
| 出版状态 | 已出版 - 14 2月 2023 |
指纹
探究 'Adaptive fusion network for RGB-D salient object detection' 的科研主题。它们共同构成独一无二的指纹。引用此
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver