Skip to main navigation Skip to search Skip to main content

DOGREAT3D: Multimodal 3D reconstruction of complex mechanical parts with diffusion models and geometric refinement strategies

  • Beihang University
  • University of New South Wales
  • Nanyang Technological University

Research output: Contribution to journalArticlepeer-review

Abstract

Reconstructing industrial mechanical parts from a single image is challenging when the output must be a CAD-compatible triangular mesh and usable in downstream CAD/CAE workflows after standard cleanup. We propose DOGREAT3D, a CAD-oriented mesh reconstruction pipeline that combines pose-conditioned multi-view diffusion with reliability-aware multimodal fusion and topology-preserving test-time refinement to enforce cross-view geometric consistency. Starting from one image (and an optional short industrial prompt), DOGREAT3D synthesizes posed views, derives geometry cues via cross-view depth alignment, and fuses RGB evidence with depth-derived normals (and optional text priors) through confidence-weighted cross-modal attention. It then initializes a watertight mesh using screened Poisson reconstruction and refines geometry and appearance with a four-stage, topology-preserving schedule that preserves sharp features and sensitive thin structures. On a wheel-hub benchmark of 1500 parts, DOGREAT3D reduces Chamfer Distance by 14.1% and improves F-score@1 mm from 0.453 to 0.526 over the strongest baseline under a unified evaluation protocol. On unseen part categories and on GSO, the geometric components transfer beyond the training distribution, improving F-score@τ from 0.647 to 0.667.

Original languageEnglish
Pages (from-to)586-599
Number of pages14
JournalJournal of Manufacturing Systems
Volume86
DOIs
StatePublished - Jun 2026

Keywords

  • 3D model generation
  • Geometric optimization
  • Multimodal integration

Fingerprint

Dive into the research topics of 'DOGREAT3D: Multimodal 3D reconstruction of complex mechanical parts with diffusion models and geometric refinement strategies'. Together they form a unique fingerprint.

Cite this