Skip to main navigation Skip to search Skip to main content

Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding

  • Tianrui Hui
  • , Zihan Ding
  • , Junshi Huang*
  • , Xiaoming Wei
  • , Xiaolin Wei
  • , Jiao Dai
  • , Jizhong Han
  • , Si Liu
  • *Corresponding author for this work
  • CAS - Institute of Information Engineering
  • University of Chinese Academy of Sciences
  • Meituan
  • Beihang University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Panoptic narrative grounding (PNG) aims to segment things and stuff objects in an image described by noun phrases of a narrative caption. As a multimodal task, an essential aspect of PNG is the visual-linguistic interaction between image and caption. The previous two-stage method aggregates visual contexts from offline-generated mask proposals to phrase features, which tend to be noisy and fragmentary. The recent one-stage method aggregates only pixel contexts from image features to phrase features, which may incur semantic misalignment due to lacking object priors. To realize more comprehensive visual-linguistic interaction, we propose to enrich phrases with coupled pixel and object contexts by designing a Phrase-Pixel-Object Transformer Decoder (PPO-TD), where both fine-grained part details and coarse-grained entity clues are aggregated to phrase features. In addition, we also propose a Phrase-Object Contrastive Loss (POCL) to pull closer the matched phrase-object pairs and push away unmatched ones for aggregating more precise object contexts from more phrase-relevant object tokens. Extensive experiments on the PNG benchmark show our method achieves new state-of-the-art performance with large margins.

Original languageEnglish
Title of host publicationProceedings of the 32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
EditorsEdith Elkind
PublisherInternational Joint Conferences on Artificial Intelligence
Pages893-901
Number of pages9
ISBN (Electronic)9781956792034
DOIs
StatePublished - 2023
Event32nd International Joint Conference on Artificial Intelligence, IJCAI 2023 - Macao, China
Duration: 19 Aug 202325 Aug 2023

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
Volume2023-August
ISSN (Print)1045-0823

Conference

Conference32nd International Joint Conference on Artificial Intelligence, IJCAI 2023
Country/TerritoryChina
CityMacao
Period19/08/2325/08/23

Fingerprint

Dive into the research topics of 'Enriching Phrases with Coupled Pixel and Object Contexts for Panoptic Narrative Grounding'. Together they form a unique fingerprint.

Cite this