跳到主要导航 跳到搜索 跳到主要内容

Multi-model cooperative denoising for robust cross-modal retrieval with noisy labels

  • Man Wu
  • , Hengmiao Zhang
  • , Jing Fang*
  • , Yang Yang
  • , Xiong Luo
  • *此作品的通讯作者
  • University of Science and Technology Beijing
  • Zhengzhou University
  • Aviation Data Communication Corporation
  • State Key Laboratory of CNS/ATM

科研成果: 期刊稿件文章同行评审

摘要

In recent years, the growing volume of multimodal data has drawn sustained interest to the Cross-Modal Retrieval (CMR) task, which shows great potential for applications such as multimedia management and intelligent search. Most approaches map data from various modalities into a common representation space, where label information is utilized to separate samples belonging to different semantic classes. However, labels of multimodal data in real-world applications can be noisy, which greatly diminishes the performance of existing methods that depend on clean labels. In this work, we introduce a novel Multi-Model Cooperative Denoising (MMCD) approach to adaptively clarify data labels and filter out noisy labels for robust cross-modal retrieval with noisy labels. Firstly, we simultaneously train multiple CMR models to learn cross-modal correspondence more robustly. Secondly, we propose an adaptive multimodal noise filtering mechanism, which can dynamically select clean labels based on the voting results of multiple models. Thirdly, we propose a denoising cross-modal contrastive learning mechanism, which relieves the impact of noisy labels and enables CMR models to learn refined semantic correspondence across modalities. Extensive experiments on three benchmark datasets for cross-modal retrieval (CMR) show that our proposed MMCD outperforms state-of-the-art methods in handling noisy labels.

源语言英语
文章编号57
期刊Multimedia Systems
32
1
DOI
出版状态已出版 - 2月 2026

指纹

探究 'Multi-model cooperative denoising for robust cross-modal retrieval with noisy labels' 的科研主题。它们共同构成独一无二的指纹。

引用此