TY - GEN
T1 - Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding
AU - Shi, Jin Chuan
AU - Wang, Miao
AU - Duan, Hao Bin
AU - Guan, Shao Hua
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as ob-ject localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their effi-cacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view syn-thesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaus-sians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the mem-ory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our compre-hensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. Project page: https://buaavrcg.github.io/LEGaussians/.
AB - Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as ob-ject localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their effi-cacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view syn-thesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaus-sians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the mem-ory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our compre-hensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. Project page: https://buaavrcg.github.io/LEGaussians/.
UR - https://www.scopus.com/pages/publications/85202070294
U2 - 10.1109/CVPR52733.2024.00510
DO - 10.1109/CVPR52733.2024.00510
M3 - 会议稿件
AN - SCOPUS:85202070294
SN - 9798350353006
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 5333
EP - 5343
BT - Proceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PB - IEEE Computer Society
T2 - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Y2 - 16 June 2024 through 22 June 2024
ER -