Skip to main navigation Skip to search Skip to main content

Visual-textual sentiment classification with bi-directional multi-level attention networks

  • Jie Xu
  • , Feiran Huang
  • , Xiaoming Zhang*
  • , Senzhang Wang
  • , Chaozhuo Li
  • , Zhoujun Li
  • , Yueying He
  • *Corresponding author for this work
  • Beihang University
  • Jinan University
  • Guangdong Key Laboratory of Data Security and Privacy Preserving
  • Nanjing University of Aeronautics and Astronautics
  • National Computer Network Emergency Response Technical Team/Coordination Center of China

Research output: Contribution to journalArticlepeer-review

Abstract

Social network has become an inseparable part of our daily lives and thus the automatic sentiment analysis on social media content is of great significance to identify people's viewpoints, attitudes, and emotions on the social websites. Most existing works have concentrated on the sentiment analysis of single modality such as image or text, which cannot handle the social media content with multiple modalities including both image and text. Although some works tried to conduct multi-modal sentiment analysis, the complicated correlations between the two modalities have not been fully explored. In this paper, we propose a novel Bi-Directional Multi-Level Attention (BDMLA)model to exploit the complementary and comprehensive information between the image modality and text modality for joint visual-textual sentiment classification. Specifically, to highlight the emotional regions and words in the image–text pair, visual attention network and semantic attention network are proposed respectively. The visual attention network makes region features of the image interact with multiple semantic levels of text (word, phrase, and sentence)to obtain the attended visual features. The semantic attention network makes semantic features of the text interact with multiple visual levels of image (global and local)to obtain the attended semantic features. Then, the attended visual and semantic features from the two attention networks are unified into a holistic framework to conduct visual-textual sentiment classification. Proof-of-concept experiments conducted on three real-world datasets verify the effectiveness of our model.

Original languageEnglish
Pages (from-to)61-73
Number of pages13
JournalKnowledge-Based Systems
Volume178
DOIs
StatePublished - 15 Aug 2019

Keywords

  • Attention model
  • Multi-modal
  • Sentiment analysis
  • Social image

Fingerprint

Dive into the research topics of 'Visual-textual sentiment classification with bi-directional multi-level attention networks'. Together they form a unique fingerprint.

Cite this