跳到主要导航 跳到搜索 跳到主要内容

三维点云场景语义分割研究进展

  • Zhijiang Qiu
  • , Lin Zhang
  • , Yao Yao
  • , Xiaoqing Feng*
  • , Juntao Gao
  • *此作品的通讯作者
  • Daqing Petroleum Institute
  • Zhejiang University of Finance and Economics

科研成果: 期刊稿件文章同行评审

摘要

With the continuous advancement of depth sensing technology,particularly the widespread application of laser scanners across diverse scenarios,3D point cloud technology is playing a pivotal role in an increasing number of fields. These fields include,but are not limited to,autonomous driving,robotics,geographic information systems,manufacturing,building information modeling,cultural heritage preservation,virtual reality,and augmented reality. As a high-precision,dense,and detailed form of spatial data representation,3D point clouds can accurately capture various types of information,including the geometric shape,spatial structure,surface texture,and environmental layout of objects. Consequently,the processing,analysis,and comprehension of point cloud data are particularly significant in these applications,particularly point cloud segmentation technology,which serves as the foundation for advanced tasks,such as object recognition,scene understanding,map construction,and dynamic environment monitoring. This study aims to conduct a comprehensive review and in-depth exploration of current mainstream 3D point cloud segmentation methods from multiple perspectives,dissecting the latest advancements in this research domain. In particular,it begins with a detailed analysis and discussion of the fundamentals of 3D point cloud segmentation,covering aspects such as datasets,performance evaluation metrics,sampling methods,and feature extraction techniques. This study summarizes currently publicly available mainstream point cloud datasets,including ShapeNet,Semantic3D,S3DIS,and ScanNet,meticulously dissecting the characteristics,annotation forms,application scenarios,and technical challenges associated with each dataset. In addition,it delves into commonly utilized performance evaluation metrics in the semantic segmentation of point cloud scenes,including overall accuracy,mean class accuracy,and mean intersection over union. These metrics provide effective means for quantifying and comparing model performance,facilitating comprehensive evaluations and improvements across different tasks and scenarios. In the data preprocessing phase,this study systematically summarizes prevalent point cloud sampling strategies. Given that 3D point cloud data typically possess large scales and irregular distributions,a suitable sampling method is essential for reducing computational costs and enhancing model training efficiency. This study introduces strategies,such as farthest point sampling,random sampling,and grid sampling,analyzing their application scenarios,advantages,disadvantages,and specific implementation methods in various tasks. Furthermore,it discusses feature extraction techniques for 3D point clouds,encompassing various methods,including global feature extraction,local feature extraction,and the fusion of global and local features. Through effective feature extraction,more discriminative representations can be provided for subsequent segmentation tasks,aiding the model in improved object recognition and scene understanding. Building on this foundation,this study systematically reviews 3D point cloud segmentation methods from four distinct perspectives:point-based,voxel-based,view-based,and multi-modal fusion methods. First,point-based methods directly process each point within the point cloud,maintaining the high resolution and density of the data while avoiding information loss. Point-based methods are further subdivided into multilayer perceptron(MLP)-based methods,Transformer-based models,graph convolutional network-based methods,and other related approaches. Each category exhibits unique advantages in different application scenarios. For example,MLP is effective for capturing the local features of point clouds,while Transformer-based models excel in handling long-range dependencies and global relationships. Despite the strong performance of point-based methods,their direct processing of a large number of 3D points results in high computational complexity and relatively low efficiency when managing large-scale point cloud scenes. Second,this study presents voxel-based methods,which process point cloud data by partitioning them into regular 3D grids(voxels),effectively reducing data size and simplifying subsequent computations. This approach structures point cloud data,providing relatively stable performance in large-scale scenes. It is particularly applicable to scenarios with large scene sizes but low resolution requirements. However,the inevitable information loss and reduction in spatial resolution during voxelization limit performance in handling fine-grained tasks. Third,the view-based approach processes a 3D point cloud by projecting it onto a 2D plane,leveraging mature 2D image processing techniques and convolutional neural networks. This method transforms the point cloud segmentation task into a traditional image segmentation problem,enhancing processing efficiency,particularly in scenarios where point cloud density is high and rapid processing is required. However,projecting 3D information onto 2D space may lead to the loss of spatial geometric information,resulting in potentially lower accuracy compared with methods that directly handle 3D point clouds in certain applications. Lastly,this study explores multimodal fusion methods,which combine various forms of data,such as point clouds,voxels,and views,fully utilizing the complementarity of different modalities in scene understanding to enhance the accuracy,robustness,and generalization ability of point cloud segmentation. Subsequently,this study conducts a detailed analysis and comparison of the experimental results from different methods. Based on various datasets and performance evaluation metrics,it reveals the strengths and weaknesses of each method in diverse application scenarios. For example,point-based methods excel in fine segmentation tasks and can capture subtle geometric information,while voxel-based and view-based methods offer higher processing efficiency when dealing with large-scale point cloud scenes. Through the comparative analysis of experimental results,this study provides valuable references for point cloud segmentation tasks across different application scenarios. Finally,this study summarizes the major challenges that are currently experienced in the field of 3D point cloud segmentation,including the sparsity and irregularity of point cloud data,the influence of noise and missing points,insufficient generalization ability across diverse scenes,and the demand for real-time processing. This study also anticipates future research directions and proposes measures,such as a deeper understanding of the complex semantic structures of point cloud data through the integration of large language models,the introduction of semi-supervised and unsupervised learning methods to reduce reliance on labeled data,and enhancements in real-time performance and computational efficiency to further advance point cloud segmentation technology. We hope that this comprehensive review can provide a systematic reference for researchers and industrial applications in the field of point cloud technology,facilitating the implementation and development of 3D point cloud technology in a broader range of practical applications.

投稿的翻译标题Survey on semantic segmentation in 3D point cloud scenes
源语言繁体中文
页(从-至)2325-2342
页数18
期刊Journal of Image and Graphics
30
7
DOI
出版状态已出版 - 7月 2025
已对外发布

关键词

  • 3D point cloud
  • deep learning
  • feature extraction
  • point cloud scene semantic segmentation
  • sampling method

指纹

探究 '三维点云场景语义分割研究进展' 的科研主题。它们共同构成独一无二的指纹。

引用此