跳到主要导航 跳到搜索 跳到主要内容

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

  • Zihan Ding
  • , Zi Han Ding
  • , Tianrui Hui*
  • , Junshi Huang
  • , Xiaoming Wei
  • , Xiaolin Wei
  • , Si Liu
  • *此作品的通讯作者

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to segment visual objects of things and stuff categories described by dense narrative captions of a still image. The previous two-stage approach first extracts segmentation region proposals by an off-the-shelf panoptic segmentation model, then conducts coarse region-phrase matching to ground the candidate regions for each noun phrase. However, the two-stage pipeline usually suffers from the performance limitation of low-quality proposals in the first stage and the loss of spatial details caused by region feature pooling, as well as complicated strategies designed for things and stuff categories separately. To alleviate these drawbacks, we propose a one-stage end-to-end Pixel-Phrase Matching Network (PPMN), which directly matches each phrase to its corresponding pixels instead of region proposals and outputs panoptic segmentation by simple combination. Thus, our model can exploit sufficient and finer cross-modal semantic correspondence from the supervision of densely annotated pixel-phrase pairs rather than sparse region-phrase pairs. In addition, we also propose a Language-Compatible Pixel Aggregation (LCPA) module to further enhance the discriminative ability of phrase features through multi-round refinement, which selects the most compatible pixels for each phrase to adaptively aggregate the corresponding visual context. Extensive experiments show that our method achieves new state-of-the-art performance on the PNG benchmark with 4.0 absolute Average Recall gains.

源语言英语
主期刊名MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia
出版商Association for Computing Machinery, Inc
5537-5546
页数10
ISBN(电子版)9781450392037
DOI
出版状态已出版 - 10 10月 2022
活动30th ACM International Conference on Multimedia, MM 2022 - Lisboa, 葡萄牙
期限: 10 10月 202214 10月 2022

出版系列

姓名MM 2022 - Proceedings of the 30th ACM International Conference on Multimedia

会议

会议30th ACM International Conference on Multimedia, MM 2022
国家/地区葡萄牙
Lisboa
时期10/10/2214/10/22

指纹

探究 'PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding' 的科研主题。它们共同构成独一无二的指纹。

引用此