Visual and Audio Aware Bi-Modal Video Emotion Recognition

  • Siqi Xiang
  • , Wenge Rong
  • , Zhang Xiong
  • , Min Gao
  • , Qingyu Xiong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With rapid increase in the size of videos online, analysis and prediction of affective impact that video content will have on viewers has attracted much attention in the community. To solve this challenge several different kinds of information about video clips are exploited. Traditional methods normally focused on single modality, either audio or visual. Later on some researchers tried to establish multi-modal schemes and spend a lot of time choosing and extracting features by different fusion strategy. In this research, we proposed an end-to-end model which can automatically extract features and target an emotional classification task by integrating audio and visual features together and also adding the temporal characteristics of the video. The experimental study on commonly used MediaEval 2015 Affective Impact of Movies has shown this method's potential and it is expected that this work could provide some insight for future video emotion recognition from feature fusion perspective.

Original languageEnglish
Title of host publicationCogSci 2017 - Proceedings of the 39th Annual Meeting of the Cognitive Science Society
Subtitle of host publicationComputational Foundations of Cognition
PublisherThe Cognitive Science Society
Pages3554-3559
Number of pages6
ISBN (Electronic)9780991196760
StatePublished - 2017
Event39th Annual Meeting of the Cognitive Science Society: Computational Foundations of Cognition, CogSci 2017 - London, United Kingdom
Duration: 26 Jul 201729 Jul 2017

Publication series

NameCogSci 2017 - Proceedings of the 39th Annual Meeting of the Cognitive Science Society: Computational Foundations of Cognition

Conference

Conference39th Annual Meeting of the Cognitive Science Society: Computational Foundations of Cognition, CogSci 2017
Country/TerritoryUnited Kingdom
CityLondon
Period26/07/1729/07/17

Keywords

  • end-to-end
  • modal fusion
  • multi-modal scheme
  • temporal characteristics
  • videos

Fingerprint

Dive into the research topics of 'Visual and Audio Aware Bi-Modal Video Emotion Recognition'. Together they form a unique fingerprint.

Cite this