TY - JOUR
T1 - ThickSeg
T2 - Efficient semantic segmentation of large-scale 3D point clouds using multi-layer projection
AU - Gao, Qian
AU - Shen, Xukun
N1 - Publisher Copyright:
© 2021 Elsevier B.V.
PY - 2021/4
Y1 - 2021/4
N2 - Efficient semantic segmentation of large-scale three-dimensional (3D) point clouds is an essential approach for intelligent robots to perceive the surrounding environment. However, due to the expensive sampling process or time-consuming pre/post-processing steps, most of the current solutions are inefficient or limited in scale. In this paper, we propose a novel framework, ThickSeg, to efficiently assign semantic labels for large-scale point clouds. ThickSeg contains three main steps: Firstly, it projects raw point clouds onto a multi-layer image with a random-hit strategy to efficiently preserve more local geometric features. Secondly, the projected multi-layer image is fed into a Self-Sorting 3D Convolutional Neural Network (SS-3DCNN) to predict grid-wise semantics and subsequently project them back to their corresponding 3D points. Finally, the labels of occluded points are determined by an iterative and accumulative post-processing mechanism, avoiding time-consuming explicit 3D neighborhood searching. We validate our approach on two well-known public benchmarks (SemanticKITTI and KITTI), where ThickSeg gets state-of-the-art results and more efficient than previous methods. Our detailed ablation study shows how each component contributes to the final performance.
AB - Efficient semantic segmentation of large-scale three-dimensional (3D) point clouds is an essential approach for intelligent robots to perceive the surrounding environment. However, due to the expensive sampling process or time-consuming pre/post-processing steps, most of the current solutions are inefficient or limited in scale. In this paper, we propose a novel framework, ThickSeg, to efficiently assign semantic labels for large-scale point clouds. ThickSeg contains three main steps: Firstly, it projects raw point clouds onto a multi-layer image with a random-hit strategy to efficiently preserve more local geometric features. Secondly, the projected multi-layer image is fed into a Self-Sorting 3D Convolutional Neural Network (SS-3DCNN) to predict grid-wise semantics and subsequently project them back to their corresponding 3D points. Finally, the labels of occluded points are determined by an iterative and accumulative post-processing mechanism, avoiding time-consuming explicit 3D neighborhood searching. We validate our approach on two well-known public benchmarks (SemanticKITTI and KITTI), where ThickSeg gets state-of-the-art results and more efficient than previous methods. Our detailed ablation study shows how each component contributes to the final performance.
KW - 3D point cloud
KW - Convolutional neural network
KW - Large scale
KW - Semantic segmentation
UR - https://www.scopus.com/pages/publications/85103322014
U2 - 10.1016/j.imavis.2021.104161
DO - 10.1016/j.imavis.2021.104161
M3 - 文章
AN - SCOPUS:85103322014
SN - 0262-8856
VL - 108
JO - Image and Vision Computing
JF - Image and Vision Computing
M1 - 104161
ER -