跳到主要导航 跳到搜索 跳到主要内容

DualVD: An adaptive dual encoding model for deep visual understanding in visual dialogue

  • Xiaoze Jiang
  • , Jing Yu*
  • , Zengchang Qin
  • , Yingying Zhuang
  • , Xingxing Zhang
  • , Yue Hu
  • , Qi Wu
  • *此作品的通讯作者
  • CAS - Institute of Information Engineering
  • Beihang University
  • Microsoft USA
  • University of Adelaide

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Different from Visual Question Answering task that requires to answer only one question about an image, Visual Dialogue involves multiple questions which cover a broad range of visual content that could be related to any objects, relationships or semantics. The key challenge in Visual Dialogue task is thus to learn a more comprehensive and semantic-rich image representation which may have adaptive attentions on the image for variant questions. In this research, we propose a novel model to depict an image from both visual and semantic perspectives. Specifically, the visual view helps capture the appearance-level information, including objects and their relationships, while the semantic view enables the agent to understand high-level visual semantics from the whole image to the local regions. Futhermore, on top of such multiview image features, we propose a feature selection framework which is able to adaptively capture question-relevant information hierarchically in fine-grained level. The proposed method achieved state-of-the-art results on benchmark Visual Dialogue datasets. More importantly, we can tell which modality (visual or semantic) has more contribution in answering the current question by visualizing the gate values. It gives us insights in understanding of human cognition in Visual Dialogue.

源语言英语
主期刊名AAAI 2020 - 34th AAAI Conference on Artificial Intelligence
出版商AAAI press
11125-11132
页数8
ISBN(电子版)9781577358350
出版状态已出版 - 2020
活动34th AAAI Conference on Artificial Intelligence, AAAI 2020 - New York, 美国
期限: 7 2月 202012 2月 2020

出版系列

姓名AAAI 2020 - 34th AAAI Conference on Artificial Intelligence

会议

会议34th AAAI Conference on Artificial Intelligence, AAAI 2020
国家/地区美国
New York
时期7/02/2012/02/20

指纹

探究 'DualVD: An adaptive dual encoding model for deep visual understanding in visual dialogue' 的科研主题。它们共同构成独一无二的指纹。

引用此