TY - JOUR
T1 - Efficient 3D object annotation via vision-derived pseudo-LiDAR and Vision Language Model (VLM) validation
AU - Ma, Yalong
AU - Yao, Ziying
AU - Liu, Xuan
AU - Xiong, Zhongxia
AU - He, Xiaozheng
AU - Wu, Xinkai
N1 - Publisher Copyright:
© 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
PY - 2026/1
Y1 - 2026/1
N2 - To advance autonomous driving, accurate 3D object annotation is crucial for target recognition, environment perception, and high-precision map construction. However, producing high-quality 3D annotated data is costly and time-consuming. In particular, for sparse point cloud data, it is both labor-intensive and error-prone to annotate 3D objects. To address this challenge, this paper proposes an efficient automated annotation pipeline that integrates pseudo-point cloud generation with validation using a vision language model (VLM). Our approach supplements sparse point cloud data, generates pseudo-labels, and leverages a VLM model to validate and filter annotations, thereby creating a closed-loop automated system. Experiments on a real-world dataset collected by an autonomous vehicle demonstrate significant improvements in annotation accuracy and efficiency.
AB - To advance autonomous driving, accurate 3D object annotation is crucial for target recognition, environment perception, and high-precision map construction. However, producing high-quality 3D annotated data is costly and time-consuming. In particular, for sparse point cloud data, it is both labor-intensive and error-prone to annotate 3D objects. To address this challenge, this paper proposes an efficient automated annotation pipeline that integrates pseudo-point cloud generation with validation using a vision language model (VLM). Our approach supplements sparse point cloud data, generates pseudo-labels, and leverages a VLM model to validate and filter annotations, thereby creating a closed-loop automated system. Experiments on a real-world dataset collected by an autonomous vehicle demonstrate significant improvements in annotation accuracy and efficiency.
KW - 3D object detection
KW - Automated data annotation
KW - Multi-sensor fusion
KW - Pseudo label learning
KW - Vision language model(VLM)
UR - https://www.scopus.com/pages/publications/105029692618
U2 - 10.1016/j.trc.2025.105429
DO - 10.1016/j.trc.2025.105429
M3 - 文章
AN - SCOPUS:105029692618
SN - 0968-090X
VL - 182
JO - Transportation Research Part C: Emerging Technologies
JF - Transportation Research Part C: Emerging Technologies
M1 - 105429
ER -