跳到主要导航 跳到搜索 跳到主要内容

Global Context Enhanced Multi-modal Fusion for Referring Image Segmentation

  • Jianhua Yang
  • , Yan Huang
  • , Linjiang Huang
  • , Yunbo Wang
  • , Zhanyu Ma
  • , Liang Wang*
  • *此作品的通讯作者
  • Beijing University of Posts and Telecommunications
  • NLPR
  • Chinese Academy of Sciences
  • Chinese Academy of Sciences

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The referring image segmentation is a challenging task which aims to segment the object of interest in an image according to a natural language expression. Most existing works directly concatenate the global language representation with local visual features, and follow by a convolutional operation to fuse two modalities. These works ignore that the global contextual information from vision is essential for vision-language fusing and inferring the referred objects. The global context can establish a perception of the full image, thus it’s fusion with global language representation is beneficial to reduce mislabeled pixels of similar objects in an image. To address aforementioned issue, we propose a global fusion network (GFNet), which is composed of visual guided global fusion module and language guided global fusion module. By modeling the expression-region interactions, two modules can aggregate the expression-related visual contextual information and fuse it with global representation of language expression. Moreover, to alleviate the distribution differences between two modalities, we introduce a channel-wise self-gate on visual-language concatenated features. We validate the proposed network on four standard datasets, the experimental results show that our approach outperforms state-of-the-art methods.

源语言英语
主期刊名Pattern Recognition and Computer Vision - 3rd Chinese Conference, PRCV 2020, Proceedings
编辑Yuxin Peng, Hongbin Zha, Qingshan Liu, Huchuan Lu, Zhenan Sun, Chenglin Liu, Xilin Chen, Jian Yang
出版商Springer Science and Business Media Deutschland GmbH
434-446
页数13
ISBN(印刷版)9783030606329
DOI
出版状态已出版 - 2020
已对外发布
活动3rd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2020 - Nanjing, 中国
期限: 16 10月 202018 10月 2020

出版系列

姓名Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
12305 LNCS
ISSN(印刷版)0302-9743
ISSN(电子版)1611-3349

会议

会议3rd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2020
国家/地区中国
Nanjing
时期16/10/2018/10/20

指纹

探究 'Global Context Enhanced Multi-modal Fusion for Referring Image Segmentation' 的科研主题。它们共同构成独一无二的指纹。

引用此