A multi-scale representation and multi-level decision learning network for multimodal sentiment analysis

Research output: Contribution to journalArticlepeer-review

Abstract

Multimodal sentiment analysis (MSA) involves fusing multimodal information from different modalities to predict sentiment states, such as spoken language, acoustic features, and facial expressions. Previous efforts in this area mainly focus on sequential multimodal learning and rely on single-level representations for final decision. However, these methods fail to learn and leverage multimodal information present in different layers to achieve accurate sentiment prediction. To address these issues, this paper proposes a novel Multi-scale representation and Multi-level decision Learning Network (MMLN) for multimodal sentiment analysis. Specifically, a Hybrid Multi-scale Transformer (HMT) module is developed to progressively learn multimodal representations across multiple scales. HMT employs shared routing tokens to collect and aggregate multimodal information of different layers, with each layer corresponding to a specific scale. Moreover, a Layer-wise Correlation Learning (LCL) module is incorporated to refine the multi-scale representations and enhance cross-modal dependencies. LCL minimizes the Jensen-Shannon divergence to encourage the model to facilitate representation learning and fusion at each scale. Ultimately, a Multi-level Decision Fusion (MDF) module is introduced to leverage refined multi-scale representations to produce multiple layers of prediction results that are adaptively fused to enhance sentiment prediction robustness. Extensive experiments on four widely-recognized MSA datasets (two in English and two in Chinese) demonstrate that MMLN outperforms the current leading baselines on most evaluation metrics. Notably, on the MOSEI dataset, our model achieves the best Acc-7 and binary accuracy of 54.6 % and 86.7 %, respectively, with comparable improvements observed across the other benchmark datasets.

Original languageEnglish
Article number129341
JournalExpert Systems with Applications
Volume297
DOIs
StatePublished - 1 Feb 2026

Keywords

  • Multi-level decision
  • Multi-scale representation
  • Multimodal fusion
  • Representation learning
  • Sentiment analysis

Fingerprint

Dive into the research topics of 'A multi-scale representation and multi-level decision learning network for multimodal sentiment analysis'. Together they form a unique fingerprint.

Cite this