Abstract
Reconstructing industrial mechanical parts from a single image is challenging when the output must be a CAD-compatible triangular mesh and usable in downstream CAD/CAE workflows after standard cleanup. We propose DOGREAT3D, a CAD-oriented mesh reconstruction pipeline that combines pose-conditioned multi-view diffusion with reliability-aware multimodal fusion and topology-preserving test-time refinement to enforce cross-view geometric consistency. Starting from one image (and an optional short industrial prompt), DOGREAT3D synthesizes posed views, derives geometry cues via cross-view depth alignment, and fuses RGB evidence with depth-derived normals (and optional text priors) through confidence-weighted cross-modal attention. It then initializes a watertight mesh using screened Poisson reconstruction and refines geometry and appearance with a four-stage, topology-preserving schedule that preserves sharp features and sensitive thin structures. On a wheel-hub benchmark of 1500 parts, DOGREAT3D reduces Chamfer Distance by 14.1% and improves F-score@1 mm from 0.453 to 0.526 over the strongest baseline under a unified evaluation protocol. On unseen part categories and on GSO, the geometric components transfer beyond the training distribution, improving F-score@τ from 0.647 to 0.667.
| Original language | English |
|---|---|
| Pages (from-to) | 586-599 |
| Number of pages | 14 |
| Journal | Journal of Manufacturing Systems |
| Volume | 86 |
| DOIs | |
| State | Published - Jun 2026 |
Keywords
- 3D model generation
- Geometric optimization
- Multimodal integration
Fingerprint
Dive into the research topics of 'DOGREAT3D: Multimodal 3D reconstruction of complex mechanical parts with diffusion models and geometric refinement strategies'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver