Skip to main navigation Skip to search Skip to main content

Visual Question Answering via Combining Inferential Attention and Semantic Space Mapping

  • Beihang University
  • Hefei University
  • Jinan University
  • National Computer Network Emergency Response Technical Team/Coordination Center of China

Research output: Contribution to journalArticlepeer-review

Abstract

Visual Question Answering (VQA) has emerged and aroused widespread interest in recent years. Its purpose is to explore the close correlations between the image and question for answer inference. We have two observations about the VQA task: (1) the number of newly defined answers is ever-growing, which means that answer prediction on pre-defined labeled answers may lead to errors, as some unlabeled answers may be the right choice to the question–image pairs; (2) in the process of answering visual questions, the gradual change of human attention has an important guiding role in exploring the correlations between images and questions. Based on these observations, we propose a novel model for VQA, i.e., combining Inferential Attention and Semantic Space Mapping (IASSM). Specifically, our model has two salient aspects: (1) a semantic space shared by both the labeled and unlabeled answers is constructed to learn new answers, where the joint embedding of a question and the corresponding image is mapped and clustered around the answer exemplar; (2) a novel inferential attention model is designed to simulate the learning process of human attention to explore the correlations between the image and question. It focuses on the more important question words and image regions associated with the question. Both the inferential attention and the semantic space mapping modules are integrated into an end-to-end framework to infer the answer. Experiments performed on two public VQA datasets and our newly constructed dataset show the superiority of IASSM compared with existing methods.

Original languageEnglish
Article number106339
JournalKnowledge-Based Systems
Volume207
DOIs
StatePublished - 5 Nov 2020

Keywords

  • Inferential attention
  • Semantic space mapping
  • Visual Question Answering

Fingerprint

Dive into the research topics of 'Visual Question Answering via Combining Inferential Attention and Semantic Space Mapping'. Together they form a unique fingerprint.

Cite this