基于听说知识融合网络的多模态对话情绪识别

Translated title of the contribution: Listening and speaking knowledge fusion network for multi-modal emotion recognition in conversation
  • Qin Liu
  • , Jun Xie*
  • , Yong Hu
  • , Shu Feng Hao
  • , Ya Hui Hao
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Multi-modal emotion recognition in conversation aims to identify the emotion of the target utterance according to the multi-modal conversation context, which is the primary task of building empathetic dialogue systems (EDS). Existing works only consider multi-modal conversation itself while ignoring the knowledge information about the listener and the speaker, leading to the limit in capturing the emotional features of the target utterance. To solve this problem, a listening and speaking knowledge fusion network (LSKFN) is proposed, which introduces the external common sense knowledge and fuses it with multi-modal context efficiently. The proposed LSKFN consists of four stages, which are used to extract multi-modal context features, integrate listening and speaking knowledge features, eliminate redundant features, and predict emotional probability distribution. Experimental results on two public multi-modal conversation datasets demonstrate that the LSKFN can extract richer emotional features for the target utterance, and obtain better emotional recognition performance compared with other benchmark models.

Translated title of the contributionListening and speaking knowledge fusion network for multi-modal emotion recognition in conversation
Original languageChinese (Traditional)
Pages (from-to)2031-2040
Number of pages10
JournalKongzhi yu Juece/Control and Decision
Volume39
Issue number6
DOIs
StatePublished - Jun 2024

Fingerprint

Dive into the research topics of 'Listening and speaking knowledge fusion network for multi-modal emotion recognition in conversation'. Together they form a unique fingerprint.

Cite this