Duplicate Multi-modal Entities Detection with Graph Contrastive Self-training Network

  • Shuyun Gu
  • , Xiao Wang
  • , Chuan Shi*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Duplicate multi-modal entities detection aims to find highly similar entities from massive entities with multi-modal information, which is a basic task in many applications and becoming more important and urgent with the development of Internet and e-commerce platforms. Traditional methods employ machine learning or deep learning on feature embedding extracted from multi-modal information, which ignores the correlation among entities and modals. Inspired by the popular Graph Neural Networks (GNNs), we can analyze the multi-relation graph of entities constructed from their multi-modal information with GNN. However, this solution still faces the extreme label sparsity challenge, particularly in industrial applications. In this work, we propose a novel graph contrastive self-training network model, named CT-GNN, for duplicate multi-modal entities detection with extreme label sparsity. With the multi-relation graph of entities constructed from multi-modal features of entities with KNN, we first learn the preliminary node embeddings with existing GNN, e.g., GCNs. To alleviate the problem of extremely sparse labels, we design a layer contrastive module to effectively exploit implicit label information, as well as a pseudo labels extension module to determine label boundary. In addition, graph structure learning is introduced to refine the structure of the multi-relation graph. A uniform optimization framework is designed to seamlessly integrate these three components. Sufficient experiments on real datasets, in comparison with SOTA baselines, well demonstrate the effectiveness of our proposed method.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases
Subtitle of host publicationResearch Track - European Conference, ECML PKDD 2023, Proceedings
EditorsDanai Koutra, Claudia Plant, Manuel Gomez Rodriguez, Elena Baralis, Francesco Bonchi
PublisherSpringer Science and Business Media Deutschland GmbH
Pages651-665
Number of pages15
ISBN (Print)9783031434143
DOIs
StatePublished - 2023
Externally publishedYes
Event23rd Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2023 - Turin, Italy
Duration: 18 Sep 202322 Sep 2023

Publication series

NameLecture Notes in Computer Science
Volume14170 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference23rd Joint European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2023
Country/TerritoryItaly
CityTurin
Period18/09/2322/09/23

Keywords

  • Duplicate enetites
  • Graph learning
  • Self-supervised learning
  • Self-training learning

Fingerprint

Dive into the research topics of 'Duplicate Multi-modal Entities Detection with Graph Contrastive Self-training Network'. Together they form a unique fingerprint.

Cite this