Skip to main navigation Skip to search Skip to main content

Taking Language Embedded 3D Gaussian Splatting into the Wild

  • Yuze Wang
  • , Junyi Wang
  • , Yue Qi*
  • *Corresponding author for this work
  • Beihang University
  • Shandong University

Research output: Contribution to journalArticlepeer-review

Abstract

Recent advances in leveraging large-scale Internet photo collections for 3D reconstruction have enabled immersive virtual exploration of landmarks and historic sites worldwide. However, existing methods primarily focus on visual appearance reconstruction, often overlooking the interactive semantic understanding of these 3D scenes (e.g., identifying specific building parts or scene details), which remains largely confined to browsing static text-image pairs. Therefore, can we draw inspiration from 3D in-the-wild reconstruction techniques and use unconstrained photo collections to create an immersive approach for comprehensive 3D scene understanding beyond mere visual appearance? To this end, we extend language embedded 3D Gaussian splatting (3DGS) and propose a novel framework for open-vocabulary scene understanding from unconstrained photo collections. Specifically, we first render multiple appearance images from the same viewpoint as the unconstrained image with the reconstructed radiance field, then extract multi-appearance CLIP features and two types of language feature uncertainty maps-transient and appearance uncertainty-derived from the multi-appearance features to guide the subsequent optimization process. Next, we propose a transient uncertainty-aware autoencoder, a multi-appearance language field 3DGS representation, and a post-ensemble strategy to effectively compress, learn, and fuse language features from multiple appearances. Finally, to quantitatively evaluate our method, we introduce PT-OVS, a new benchmark dataset for assessing open-vocabulary segmentation performance on unconstrained photo collections. Experimental results show that our method outperforms existing methods, delivering accurate open-vocabulary segmentation and enabling applications such as interactive roaming with open-vocabulary queries, architectural style pattern recognition, and 3D scene editing.

Original languageEnglish
JournalIEEE Transactions on Visualization and Computer Graphics
DOIs
StateAccepted/In press - 2026

Keywords

  • 3D Gaussian Splatting
  • In-the-wild Scene Understanding
  • Unconstrained Photo Collection

Fingerprint

Dive into the research topics of 'Taking Language Embedded 3D Gaussian Splatting into the Wild'. Together they form a unique fingerprint.

Cite this