跳到主要导航 跳到搜索 跳到主要内容

3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds

  • Beihang University
  • The University of Sydney

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

Observing that the 3D captioning task and the 3D grounding task contain both shared and complementary information in nature, in this work, we propose a unified framework to jointly solve these two distinct but closely related tasks in a synergistic fashion, which consists of both shared task-agnostic modules and lightweight task-specific modules. On one hand, the shared task-agnostic modules aim to learn precise locations of objects, fine-grained attribute features to characterize different objects, and complex relations between objects, which benefit both captioning and visual grounding. On the other hand, by casting each of the two tasks as the proxy task of another one, the lightweight task-specific modules solve the captioning task and the grounding task respectively. Extensive experiments and ablation study on three 3D vision and language datasets demonstrate that our joint training frame-work achieves significant performance gains for each individual task and finally improves the state-of-the-art performance for both captioning and grounding tasks.

源语言英语
主期刊名Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
出版商IEEE Computer Society
16443-16452
页数10
ISBN(电子版)9781665469463
DOI
出版状态已出版 - 2022
活动2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022 - New Orleans, 美国
期限: 19 6月 202224 6月 2022

出版系列

姓名Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
2022-June
ISSN(印刷版)1063-6919

会议

会议2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
国家/地区美国
New Orleans
时期19/06/2224/06/22

指纹

探究 '3DJCG: A Unified Framework for Joint Dense Captioning and Visual Grounding on 3D Point Clouds' 的科研主题。它们共同构成独一无二的指纹。

引用此