跳到主要导航 跳到搜索 跳到主要内容

Examine before you answer: Multi-task Learning with Adaptive-attentions for Multiple-choice VQA

  • Lianli Gao
  • , Pengpeng Zeng
  • , Jingkuan Song
  • , Xianglong Liu
  • , Heng Tao Shen*
  • *此作品的通讯作者
  • University of Electronic Science and Technology of China

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Multiple-choice (MC) Visual Question Answering (VQA) is a similar but essentially different task to open-ended VQA because the answer options are provided. Most of existing works tackle them in a unified pipeline by solving a multi-class problem to infer the best answer from a predefined answer set. The option that matches the best answer is selected for MC VQA. Nevertheless, this violates human thinking logics. Normally, people examine the questions, answer options and the reference image before inferring a MC VQA. For MC VQA, human either rely on the question and answer options to directly deduce a correct answer if the question is not image-related, or read the question and answer options and then purposefully search for answers in a reference image. Therefore, we propose a novel approach, namely Multi-task Learning with Adaptive-attention (MTA), to simulate human logics for MC VQA. Specifically, we first fuse the answer options and question features, and then adaptively attend to the visual features for inferring a MC VQA. Furthermore, we design our model as a multi-task learning architecture by integrating the open-ended VQA task to further boost the performance of MC VQA. We evaluate our approach on two standard benchmark datasets: VQA and Visual7W and our approach sets new records on both datasets for MC VQA task, reaching 73.5% and 65.9% average accuracy respectively.

源语言英语
主期刊名MM 2018 - Proceedings of the 2018 ACM Multimedia Conference
出版商Association for Computing Machinery, Inc
1742-1750
页数9
ISBN(电子版)9781450356657
DOI
出版状态已出版 - 15 10月 2018
活动26th ACM Multimedia conference, MM 2018 - Seoul, 韩国
期限: 22 10月 201826 10月 2018

出版系列

姓名MM 2018 - Proceedings of the 2018 ACM Multimedia Conference

会议

会议26th ACM Multimedia conference, MM 2018
国家/地区韩国
Seoul
时期22/10/1826/10/18

指纹

探究 'Examine before you answer: Multi-task Learning with Adaptive-attentions for Multiple-choice VQA' 的科研主题。它们共同构成独一无二的指纹。

引用此