TY - GEN
T1 - SSCCPC-Net
T2 - 41st Computer Graphics International Conference, CGI 2024
AU - Duan, Wantong
AU - Bao, Yongtang
AU - Qi, Yue
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Compare to traditional Scene Completion (SC), Semantic Scene Completion (SSC) is a challenging task that aims to generate complete and semantically consistent 3D scene from partial and sparse input data, which is fundamental to fully understanding the scene and being able to interact with it. Consequently, the SSC task has received much attention in recent years. Most of the methods are voxel-based approaches, but they have high computational and memory requirements. A few works based on point cloud do not sufficiently exploit the correlation between semantic segmentation and geometric completion subtasks, while focusing too much on point cloud shape features and ignoring the rich texture information that RGB images can provide. In this paper, we present SSCCPC-Net (Semantic Scene Completion with CLIP on Point Cloud-Net), a novel network architecture for point cloud semantic scene completion using a combination of 2D and 3D features. Inspired by recent works of large pretrained vision-language models in semantic segmentation, we explore to accomplish SSC task with the help of Contrastive Language-Image Pre-Training (CLIP) model. Specifically, we use the CLIP features for guidance to fuse the 2D features extracted from the RGB image and the 3D features extracted from the point cloud. The fused features are then fed into our designed Semantic-Completion Decoder for per-point semantic prediction and semantic labeling-assisted point cloud completion. Finally, we obtain the complete semantically point cloud. Numerous experiments have demonstrated that our method has higher effectiveness and generalizability compared to state-of-the-art methods.
AB - Compare to traditional Scene Completion (SC), Semantic Scene Completion (SSC) is a challenging task that aims to generate complete and semantically consistent 3D scene from partial and sparse input data, which is fundamental to fully understanding the scene and being able to interact with it. Consequently, the SSC task has received much attention in recent years. Most of the methods are voxel-based approaches, but they have high computational and memory requirements. A few works based on point cloud do not sufficiently exploit the correlation between semantic segmentation and geometric completion subtasks, while focusing too much on point cloud shape features and ignoring the rich texture information that RGB images can provide. In this paper, we present SSCCPC-Net (Semantic Scene Completion with CLIP on Point Cloud-Net), a novel network architecture for point cloud semantic scene completion using a combination of 2D and 3D features. Inspired by recent works of large pretrained vision-language models in semantic segmentation, we explore to accomplish SSC task with the help of Contrastive Language-Image Pre-Training (CLIP) model. Specifically, we use the CLIP features for guidance to fuse the 2D features extracted from the RGB image and the 3D features extracted from the point cloud. The fused features are then fed into our designed Semantic-Completion Decoder for per-point semantic prediction and semantic labeling-assisted point cloud completion. Finally, we obtain the complete semantically point cloud. Numerous experiments have demonstrated that our method has higher effectiveness and generalizability compared to state-of-the-art methods.
KW - 3D scene semantic segmentation
KW - point cloud
KW - scene semantic completion
KW - scene understanding
UR - https://www.scopus.com/pages/publications/86000440524
U2 - 10.1007/978-3-031-82024-3_2
DO - 10.1007/978-3-031-82024-3_2
M3 - 会议稿件
AN - SCOPUS:86000440524
SN - 9783031820236
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 16
EP - 29
BT - Advances in Computer Graphics - 41st Computer Graphics International Conference, CGI 2024, Proceedings
A2 - Magnenat-Thalmann, Nadia
A2 - Kim, Jinman
A2 - Sheng, Bin
A2 - Deng, Zhigang
A2 - Thalmann, Daniel
A2 - Li, Ping
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 1 July 2024 through 5 July 2024
ER -