Skip to main navigation Skip to search Skip to main content

Editing 3D Scenes via Text Prompts without Retraining

  • Shuangkang Fang
  • , Yufeng Wang*
  • , Yi Hsuan Tsai
  • , Wenrui Ding
  • , Yi Yang
  • , Shuchang Zhou
  • , Ming Hsuan Yang
  • *Corresponding author for this work
  • Beihang University
  • Atmanity Inc
  • StepFun
  • University of California Merced

Research output: Contribution to journalArticlepeer-review

Abstract

Numerous diffusion models have been developed for 2D image synthesis and editing, and recently they are extended to 3D scene editing tasks. However, editing 3D scenes is still in its early stages, and the challenges of scene representations and multi-view consistency need to be addressed. A notable limitation of existing approaches is the need for specific modules for different edits and model retraining for each scene. To tackle these issues, we propose a novel and versatile text-driven 3D scene editing method, termed DN2N, which allows for the direct acquisition of the editing results without the requirement for retraining. Our method employs off-the-shelf text-based editing models of 2D images to modify the multi-view images of a 3D scene. A content filtering process is then applied to discard poorly edited images that disrupt 3D consistency. We consider the remaining inconsistency as a problem of removing noise perturbations and solve it by generating data with similar perturbation characteristics for training. We develop a versatile NeRF model structure and propose two novel cross-view regularization terms to help the DN2N mitigate these perturbations. Empirical results show that our method achieves multiple editing types based solely on text prompts, including but not limited to appearance editing, weather transition, object changing, and style transfer. Most importantly, DN2N exhibits a versatility of editing capabilities, eliminating the need to customize or retrain editing models for specific scenes or editing types. Namely, DN2N achieves comparable total editing time to the 3DGS-based editing method, enhancing its practical value.

Original languageEnglish
JournalIEEE Transactions on Visualization and Computer Graphics
DOIs
StateAccepted/In press - 2026

Keywords

  • 3D object editing
  • 3D scene editing
  • IBRNet
  • NeRF
  • diffusion model
  • text prompts

Fingerprint

Dive into the research topics of 'Editing 3D Scenes via Text Prompts without Retraining'. Together they form a unique fingerprint.

Cite this