Measuring Similarity of Dual-Modal Academic Data Based on Multi-Fusion Representation Learning

  • Li Zhang
  • , Qiang Gao
  • , Ming Liu*
  • , Zepeng Gu
  • , Bo Lang
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Nowadays, academic materials such as articles, patents, lecture notes, and observation records often use both texts and images (i.e., dual-modal data) to illustrate scientific issues. Measuring the similarity of such dual-modal academic data largely depends on dual-modal features, which is far from satisfying in practice. To learn dual-modal feature representation, most current approaches mine interactions between texts and images on top of their fusion networks. This work proposes a multi-fusion deep learning framework that learns semantically richer dual-modal representations. The framework designs multiple fusion points in the feature space of various levels, and gradually integrates the fusion information from the low-level to the high-level. In addition, we develop a multi-channel decoding network with alternate fine-tuning strategies to mine modal-specific features and cross-modal correlations thoroughly. To our knowledge, this is the first work to bring forward deep learning functions for dual-modal academic data. It reduces the semantic and statistical attribute differences between two modalities, thereby learning robust representations. A large number of experiments conducted on real-world data sets show that our method has significant performance compared with state-of-the-art approaches.

Original languageEnglish
Pages (from-to)97701-97711
Number of pages11
JournalIEEE Access
Volume12
DOIs
StatePublished - 2024

Keywords

  • Scholarly big data
  • deep learning
  • dual-modal academic data
  • multi fusion

Fingerprint

Dive into the research topics of 'Measuring Similarity of Dual-Modal Academic Data Based on Multi-Fusion Representation Learning'. Together they form a unique fingerprint.

Cite this