Multimodal Transformation for Small-Scale 3-D Object Detection in Industrial Scenarios

  • Junhua Sun
  • , Shixiang Ma
  • , Jie Zhang*
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Detecting small and information-scarce objects within complex 3-D backgrounds remains a critical yet challenging task in industrial scenarios. While existing multimodal approaches leverage the cross-modal complementation between point clouds and intensity data to improve object representation, they still face limitations in addressing multimodal fine alignment and achieving precise 3-D positional regression. This article proposes a new 3-D object detection network based on feature fusion between point clouds and multiview images. First, we propose a dense multimodal feature fusion (DMFF) module that establishes point-to-pixel fine correspondence and effectively integrates multimodal feature channels. Then, we design a normalized 3-D positional embedding generator to enhance a transformer-based detection head, which improves localization accuracy through refined positional encoding (PE) of the fused features. Experimental results on a multimodal industrial dataset demonstrate that the proposed method achieves state-of-the-art performance with an AP of 0.903 and a recall of 94.10%, representing 6.35% and 3.58% improvements in comparison to the existing optimal method. Specifically, the method achieves optimal performance of 2.66 and 0.85 mm on the proposed metrics 3-D mATE and 3-D mASE, representing 19.64%, and 29.75% improvement over the existing optimal method, respectively.

Original languageEnglish
Article number2542210
JournalIEEE Transactions on Instrumentation and Measurement
Volume74
DOIs
StatePublished - 2025

Keywords

  • 3-D position embedding
  • 3-D small object detection
  • component inspection
  • multiview images
  • point cloud-image fusion

Fingerprint

Dive into the research topics of 'Multimodal Transformation for Small-Scale 3-D Object Detection in Industrial Scenarios'. Together they form a unique fingerprint.

Cite this