TY - GEN
T1 - Duplicate Multi-modal Entities Detection with Graph Contrastive Self-training Network
AU - Gu, Shuyun
AU - Wang, Xiao
AU - Shi, Chuan
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023
Y1 - 2023
N2 - Duplicate multi-modal entities detection aims to find highly similar entities from massive entities with multi-modal information, which is a basic task in many applications and becoming more important and urgent with the development of Internet and e-commerce platforms. Traditional methods employ machine learning or deep learning on feature embedding extracted from multi-modal information, which ignores the correlation among entities and modals. Inspired by the popular Graph Neural Networks (GNNs), we can analyze the multi-relation graph of entities constructed from their multi-modal information with GNN. However, this solution still faces the extreme label sparsity challenge, particularly in industrial applications. In this work, we propose a novel graph contrastive self-training network model, named CT-GNN, for duplicate multi-modal entities detection with extreme label sparsity. With the multi-relation graph of entities constructed from multi-modal features of entities with KNN, we first learn the preliminary node embeddings with existing GNN, e.g., GCNs. To alleviate the problem of extremely sparse labels, we design a layer contrastive module to effectively exploit implicit label information, as well as a pseudo labels extension module to determine label boundary. In addition, graph structure learning is introduced to refine the structure of the multi-relation graph. A uniform optimization framework is designed to seamlessly integrate these three components. Sufficient experiments on real datasets, in comparison with SOTA baselines, well demonstrate the effectiveness of our proposed method.
AB - Duplicate multi-modal entities detection aims to find highly similar entities from massive entities with multi-modal information, which is a basic task in many applications and becoming more important and urgent with the development of Internet and e-commerce platforms. Traditional methods employ machine learning or deep learning on feature embedding extracted from multi-modal information, which ignores the correlation among entities and modals. Inspired by the popular Graph Neural Networks (GNNs), we can analyze the multi-relation graph of entities constructed from their multi-modal information with GNN. However, this solution still faces the extreme label sparsity challenge, particularly in industrial applications. In this work, we propose a novel graph contrastive self-training network model, named CT-GNN, for duplicate multi-modal entities detection with extreme label sparsity. With the multi-relation graph of entities constructed from multi-modal features of entities with KNN, we first learn the preliminary node embeddings with existing GNN, e.g., GCNs. To alleviate the problem of extremely sparse labels, we design a layer contrastive module to effectively exploit implicit label information, as well as a pseudo labels extension module to determine label boundary. In addition, graph structure learning is introduced to refine the structure of the multi-relation graph. A uniform optimization framework is designed to seamlessly integrate these three components. Sufficient experiments on real datasets, in comparison with SOTA baselines, well demonstrate the effectiveness of our proposed method.
KW - Duplicate enetites
KW - Graph learning
KW - Self-supervised learning
KW - Self-training learning
UR - https://www.scopus.com/pages/publications/85174442585
U2 - 10.1007/978-3-031-43415-0_38
DO - 10.1007/978-3-031-43415-0_38
M3 - 会议稿件
AN - SCOPUS:85174442585
SN - 9783031434143
T3 - Lecture Notes in Computer Science
SP - 651
EP - 665
BT - Machine Learning and Knowledge Discovery in Databases
A2 - Koutra, Danai
A2 - Plant, Claudia
A2 - Gomez Rodriguez, Manuel
A2 - Baralis, Elena
A2 - Bonchi, Francesco
PB - Springer Science and Business Media Deutschland GmbH
T2 - 23rd Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2023
Y2 - 18 September 2023 through 22 September 2023
ER -