Skip to main navigation Skip to search Skip to main content

Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding

  • Jin Chuan Shi
  • , Miao Wang*
  • , Hao Bin Duan
  • , Shao Hua Guan
  • *Corresponding author for this work
  • Beihang University
  • Zhongguancun Laboratory

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Open-vocabulary querying in 3D space is challenging but essential for scene understanding tasks such as ob-ject localization and segmentation. Language-embedded scene representations have made progress by incorporating language features into 3D spaces. However, their effi-cacy heavily depends on neural networks that are resource-intensive in training and rendering. Although recent 3D Gaussians offer efficient and high-quality novel view syn-thesis, directly embedding language features in them leads to prohibitive memory usage and decreased performance. In this work, we introduce Language Embedded 3D Gaus-sians, a novel scene representation for open-vocabulary query tasks. Instead of embedding high-dimensional raw semantic features on 3D Gaussians, we propose a dedicated quantization scheme that drastically alleviates the mem-ory requirement, and a novel embedding procedure that achieves smoother yet high accuracy query, countering the multi-view feature inconsistencies and the high-frequency inductive bias in point-based representations. Our compre-hensive experiments show that our representation achieves the best visual quality and language querying accuracy across current language-embedded representations, while maintaining real-time rendering frame rates on a single desktop GPU. Project page: https://buaavrcg.github.io/LEGaussians/.

Original languageEnglish
Title of host publicationProceedings - 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
PublisherIEEE Computer Society
Pages5333-5343
Number of pages11
ISBN (Electronic)9798350353006
ISBN (Print)9798350353006
DOIs
StatePublished - 2024
Event2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024 - Seattle, United States
Duration: 16 Jun 202422 Jun 2024

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
ISSN (Print)1063-6919

Conference

Conference2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024
Country/TerritoryUnited States
CitySeattle
Period16/06/2422/06/24

Fingerprint

Dive into the research topics of 'Language Embedded 3D Gaussians for Open-Vocabulary Scene Understanding'. Together they form a unique fingerprint.

Cite this