Skip to main navigation Skip to search Skip to main content

3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds

  • Daigang Cai
  • , Lichen Zhao
  • , Jing Zhang*
  • , Lu Sheng
  • , Dong Xu
  • *Corresponding author for this work
  • Beihang University
  • The University of Sydney

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules. On one hand, the shared task-agnostic modules aim to learn precise locations of objects, fine-grained attribute features to characterize different objects, and complex relations between objects, which benefit both captioning and visual grounding. On the other hand, by casting each of the two tasks as the proxy task of another one, the lightweight task-specific modules solve the captioning task and the grounding task respectively. Extensive experiments and ablation study on three 3D vision and language datasets demonstrate that our joint training frame-work achieves significant performance gains for each individual task and finally improves the state-of-the-art performance for both captioning and grounding tasks.

Original languageEnglish
Title of host publicationProceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PublisherIEEE Computer Society
Pages16443-16452
Number of pages10
ISBN (Electronic)9781665469463
DOIs
StatePublished - 2022
Event2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, United States
Duration: 19 Jun 202224 Jun 2022

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume2022-June
ISSN (Print)1063-6919

Conference

Conference2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
Country/TerritoryUnited States
CityNew Orleans
Period19/06/2224/06/22

Keywords

  • 3D from multi-view and sensors
  • categorization
  • Recognition: detection
  • retrieval
  • Vision + language

Fingerprint

Dive into the research topics of '3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds'. Together they form a unique fingerprint.

Cite this