TY - JOUR
T1 - Multi-model cooperative denoising for robust cross-modal retrieval with noisy labels
AU - Wu, Man
AU - Zhang, Hengmiao
AU - Fang, Jing
AU - Yang, Yang
AU - Luo, Xiong
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.
PY - 2026/2
Y1 - 2026/2
N2 - In recent years, the growing volume of multimodal data has drawn sustained interest to the Cross-Modal Retrieval (CMR) task, which shows great potential for applications such as multimedia management and intelligent search. Most approaches map data from various modalities into a common representation space, where label information is utilized to separate samples belonging to different semantic classes. However, labels of multimodal data in real-world applications can be noisy, which greatly diminishes the performance of existing methods that depend on clean labels. In this work, we introduce a novel Multi-Model Cooperative Denoising (MMCD) approach to adaptively clarify data labels and filter out noisy labels for robust cross-modal retrieval with noisy labels. Firstly, we simultaneously train multiple CMR models to learn cross-modal correspondence more robustly. Secondly, we propose an adaptive multimodal noise filtering mechanism, which can dynamically select clean labels based on the voting results of multiple models. Thirdly, we propose a denoising cross-modal contrastive learning mechanism, which relieves the impact of noisy labels and enables CMR models to learn refined semantic correspondence across modalities. Extensive experiments on three benchmark datasets for cross-modal retrieval (CMR) show that our proposed MMCD outperforms state-of-the-art methods in handling noisy labels.
AB - In recent years, the growing volume of multimodal data has drawn sustained interest to the Cross-Modal Retrieval (CMR) task, which shows great potential for applications such as multimedia management and intelligent search. Most approaches map data from various modalities into a common representation space, where label information is utilized to separate samples belonging to different semantic classes. However, labels of multimodal data in real-world applications can be noisy, which greatly diminishes the performance of existing methods that depend on clean labels. In this work, we introduce a novel Multi-Model Cooperative Denoising (MMCD) approach to adaptively clarify data labels and filter out noisy labels for robust cross-modal retrieval with noisy labels. Firstly, we simultaneously train multiple CMR models to learn cross-modal correspondence more robustly. Secondly, we propose an adaptive multimodal noise filtering mechanism, which can dynamically select clean labels based on the voting results of multiple models. Thirdly, we propose a denoising cross-modal contrastive learning mechanism, which relieves the impact of noisy labels and enables CMR models to learn refined semantic correspondence across modalities. Extensive experiments on three benchmark datasets for cross-modal retrieval (CMR) show that our proposed MMCD outperforms state-of-the-art methods in handling noisy labels.
KW - Contrastive learning
KW - Cross-modal retrieval
KW - Noisy labels
KW - Robust learning
UR - https://www.scopus.com/pages/publications/105026458190
U2 - 10.1007/s00530-025-02121-9
DO - 10.1007/s00530-025-02121-9
M3 - 文章
AN - SCOPUS:105026458190
SN - 0942-4962
VL - 32
JO - Multimedia Systems
JF - Multimedia Systems
IS - 1
M1 - 57
ER -