Abstract
In recent years, the growing volume of multimodal data has drawn sustained interest to the Cross-Modal Retrieval (CMR) task, which shows great potential for applications such as multimedia management and intelligent search. Most approaches map data from various modalities into a common representation space, where label information is utilized to separate samples belonging to different semantic classes. However, labels of multimodal data in real-world applications can be noisy, which greatly diminishes the performance of existing methods that depend on clean labels. In this work, we introduce a novel Multi-Model Cooperative Denoising (MMCD) approach to adaptively clarify data labels and filter out noisy labels for robust cross-modal retrieval with noisy labels. Firstly, we simultaneously train multiple CMR models to learn cross-modal correspondence more robustly. Secondly, we propose an adaptive multimodal noise filtering mechanism, which can dynamically select clean labels based on the voting results of multiple models. Thirdly, we propose a denoising cross-modal contrastive learning mechanism, which relieves the impact of noisy labels and enables CMR models to learn refined semantic correspondence across modalities. Extensive experiments on three benchmark datasets for cross-modal retrieval (CMR) show that our proposed MMCD outperforms state-of-the-art methods in handling noisy labels.
| Original language | English |
|---|---|
| Article number | 57 |
| Journal | Multimedia Systems |
| Volume | 32 |
| Issue number | 1 |
| DOIs | |
| State | Published - Feb 2026 |
Keywords
- Contrastive learning
- Cross-modal retrieval
- Noisy labels
- Robust learning
Fingerprint
Dive into the research topics of 'Multi-model cooperative denoising for robust cross-modal retrieval with noisy labels'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver