Skip to main navigation Skip to search Skip to main content

A Dual Attention-based Modality-Collaborative Fusion Network for Emotion Recognition

  • Xiaoheng Zhang
  • , Yang Li*
  • *Corresponding author for this work
  • Beihang University

Research output: Contribution to journalConference articlepeer-review

Abstract

Multi-modal emotion recognition (MER) is an emerging research field in human-computer interactions. However, previous studies have explored several fusion methods to deal with the asynchronism and the heterogeneity of multimodal data but they mostly neglect the importance of discriminative unimodal information resulting in the ignorance of independence of uni-modality. Furthermore, the complementarity among different fusion strategies is seldom taken in consideration. To address these limitations, we propose a modality-collaborative fusion network (MCFN) consisting of three main components: a dual attention-based intra-modal learning module which is devoted to build the initial embedding spaces, a modality-collaborative learning approach is to reconcile the emotional information across modalities, and a two-stage fusion strategy to integrate multimodal features which are improved by a mutual adjustment approach. The proposed framework outperforms the state-of-the-art methods in overall experiments on two well-known public datasets. Our model will be available at https://github.com/zxiaohen/Speech-emotion-recognition-MCFN.

Original languageEnglish
Pages (from-to)1468-1472
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
StatePublished - 2023
Event24th Annual conference of the International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • Intra-modal
  • Modality-collaborative
  • Multimodal emotion recognition

Fingerprint

Dive into the research topics of 'A Dual Attention-based Modality-Collaborative Fusion Network for Emotion Recognition'. Together they form a unique fingerprint.

Cite this