Abstract
Multi-modal emotion recognition (MER) is an emerging research field in human-computer interactions. However, previous studies have explored several fusion methods to deal with the asynchronism and the heterogeneity of multimodal data but they mostly neglect the importance of discriminative unimodal information resulting in the ignorance of independence of uni-modality. Furthermore, the complementarity among different fusion strategies is seldom taken in consideration. To address these limitations, we propose a modality-collaborative fusion network (MCFN) consisting of three main components: a dual attention-based intra-modal learning module which is devoted to build the initial embedding spaces, a modality-collaborative learning approach is to reconcile the emotional information across modalities, and a two-stage fusion strategy to integrate multimodal features which are improved by a mutual adjustment approach. The proposed framework outperforms the state-of-the-art methods in overall experiments on two well-known public datasets. Our model will be available at https://github.com/zxiaohen/Speech-emotion-recognition-MCFN.
| Original language | English |
|---|---|
| Pages (from-to) | 1468-1472 |
| Number of pages | 5 |
| Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
| Volume | 2023-August |
| DOIs | |
| State | Published - 2023 |
| Event | 24th Annual conference of the International Speech Communication Association, Interspeech 2023 - Dublin, Ireland Duration: 20 Aug 2023 → 24 Aug 2023 |
Keywords
- Intra-modal
- Modality-collaborative
- Multimodal emotion recognition
Fingerprint
Dive into the research topics of 'A Dual Attention-based Modality-Collaborative Fusion Network for Emotion Recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver