TY - GEN
T1 - Towards Point Cloud Compression for Machine Perception
T2 - 2nd International Workshop on Generalizing from Limited Resources in the Open World, GLOW 2024, Held in Conjunction with International Joint Conference on Artificial Intelligence, IJCAI 2024
AU - Liu, Lei
AU - Hu, Zhihao
AU - Chen, Zhenghao
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024.
PY - 2024
Y1 - 2024
N2 - Point cloud compression has garnered significant interest in computer vision. However, existing algorithms primarily cater to human vision, while most point cloud data is utilized for machine vision tasks. To address this, we propose a point cloud compression framework that simultaneously handles both human and machine vision tasks. Our framework learns a scalable bit-stream, using only subsets for different machine vision tasks to save bit-rate, while employing the entire bit-stream for human vision tasks. Building on mainstream octree-based frameworks like VoxelContext-Net, OctAttention, and G-PCC, we introduce a new octree depth-level predictor. This predictor adaptively determines the optimal depth level for each octree constructed from a point cloud, controlling the bit-rate for machine vision tasks. For simpler tasks (e.g., classification) or objects/scenarios, we use fewer depth levels with fewer bits, saving bit-rate. Conversely, for more complex tasks (e.g., segmentation) or objects/scenarios, we use deeper depth levels with more bits to enhance performance. Experimental results on various datasets (e.g., ModelNet10, ModelNet40, ShapeNet, ScanNet, and KITTI) show that our point cloud compression approach improves performance for machine vision tasks without compromising human vision quality.
AB - Point cloud compression has garnered significant interest in computer vision. However, existing algorithms primarily cater to human vision, while most point cloud data is utilized for machine vision tasks. To address this, we propose a point cloud compression framework that simultaneously handles both human and machine vision tasks. Our framework learns a scalable bit-stream, using only subsets for different machine vision tasks to save bit-rate, while employing the entire bit-stream for human vision tasks. Building on mainstream octree-based frameworks like VoxelContext-Net, OctAttention, and G-PCC, we introduce a new octree depth-level predictor. This predictor adaptively determines the optimal depth level for each octree constructed from a point cloud, controlling the bit-rate for machine vision tasks. For simpler tasks (e.g., classification) or objects/scenarios, we use fewer depth levels with fewer bits, saving bit-rate. Conversely, for more complex tasks (e.g., segmentation) or objects/scenarios, we use deeper depth levels with more bits to enhance performance. Experimental results on various datasets (e.g., ModelNet10, ModelNet40, ShapeNet, ScanNet, and KITTI) show that our point cloud compression approach improves performance for machine vision tasks without compromising human vision quality.
KW - Point Cloud Compression
KW - Scalable Coding for Machine
UR - https://www.scopus.com/pages/publications/85200717648
U2 - 10.1007/978-981-97-6125-8_1
DO - 10.1007/978-981-97-6125-8_1
M3 - 会议稿件
AN - SCOPUS:85200717648
SN - 9789819761241
T3 - Communications in Computer and Information Science
SP - 3
EP - 17
BT - Generalizing from Limited Resources in the Open World - 2nd International Workshop, GLOW 2024, Held in Conjunction with IJCAI 2024, Proceedings
A2 - Guo, Jinyang
A2 - Ma, Yuqing
A2 - Ding, Yifu
A2 - Zheng, Xingyu
A2 - He, Changyi
A2 - Liu, Xianglong
A2 - Gong, Ruihao
A2 - Lu, Yantao
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 3 August 2024 through 3 August 2024
ER -