SSCCPC-Net: Simultaneously Learning 2D and 3D Features with CLIP for Semantic Scene Completion on Point Cloud

  • Wantong Duan
  • , Yongtang Bao
  • , Yue Qi*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Compare to traditional Scene Completion (SC), Semantic Scene Completion (SSC) is a challenging task that aims to generate complete and semantically consistent 3D scene from partial and sparse input data, which is fundamental to fully understanding the scene and being able to interact with it. Consequently, the SSC task has received much attention in recent years. Most of the methods are voxel-based approaches, but they have high computational and memory requirements. A few works based on point cloud do not sufficiently exploit the correlation between semantic segmentation and geometric completion subtasks, while focusing too much on point cloud shape features and ignoring the rich texture information that RGB images can provide. In this paper, we present SSCCPC-Net (Semantic Scene Completion with CLIP on Point Cloud-Net), a novel network architecture for point cloud semantic scene completion using a combination of 2D and 3D features. Inspired by recent works of large pretrained vision-language models in semantic segmentation, we explore to accomplish SSC task with the help of Contrastive Language-Image Pre-Training (CLIP) model. Specifically, we use the CLIP features for guidance to fuse the 2D features extracted from the RGB image and the 3D features extracted from the point cloud. The fused features are then fed into our designed Semantic-Completion Decoder for per-point semantic prediction and semantic labeling-assisted point cloud completion. Finally, we obtain the complete semantically point cloud. Numerous experiments have demonstrated that our method has higher effectiveness and generalizability compared to state-of-the-art methods.

Original languageEnglish
Title of host publicationAdvances in Computer Graphics - 41st Computer Graphics International Conference, CGI 2024, Proceedings
EditorsNadia Magnenat-Thalmann, Jinman Kim, Bin Sheng, Zhigang Deng, Daniel Thalmann, Ping Li
PublisherSpringer Science and Business Media Deutschland GmbH
Pages16-29
Number of pages14
ISBN (Print)9783031820236
DOIs
StatePublished - 2025
Event41st Computer Graphics International Conference, CGI 2024 - Geneva, Switzerland
Duration: 1 Jul 20245 Jul 2024

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume15340 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference41st Computer Graphics International Conference, CGI 2024
Country/TerritorySwitzerland
CityGeneva
Period1/07/245/07/24

Keywords

  • 3D scene semantic segmentation
  • point cloud
  • scene semantic completion
  • scene understanding

Fingerprint

Dive into the research topics of 'SSCCPC-Net: Simultaneously Learning 2D and 3D Features with CLIP for Semantic Scene Completion on Point Cloud'. Together they form a unique fingerprint.

Cite this