An Attention-Based Fusion Method for Target Detection in Complex Scenes

Research output: Contribution to journalArticlepeer-review

Abstract

Target detection technology is crucial for the practical implementation of unmanned driving. As one of the typical complex scenarios, mining areas pose extreme target detection challenges due to harsh environmental conditions and varying target sizes. Relying solely on a single sensor is inadequate for accomplishing perception tasks in complex environments. This study presents a fusion perception system that integrates LiDAR and camera sensors. The algorithm develops a deep learning network MineNet, which employs a Swin Transformer for layered image feature extraction and PointNet++ for layered point cloud feature extraction. Additionally, we utilize rigid body transformation and linear binary interpolation to fuse features from deep and shallow levels. To enhance the fusion effectiveness, an attention mechanism module is introduced to assign weights to point cloud features. Then a mining area dataset is established and used for comparative testing against various algorithms. The experimental results demonstrate that our proposed algorithm achieves an average precision of 81.77% and a frame rate of 11.49 on the mining area dataset, surpassing previous algorithms such as AVOD, F-PointNet, and 3D-CVF. These findings provide technical support for the practical application of unmanned driving in mining areas.

Original languageEnglish
Pages (from-to)3-11
Number of pages9
JournalIEEE Instrumentation and Measurement Magazine
Volume28
Issue number3
DOIs
StatePublished - 2025

Fingerprint

Dive into the research topics of 'An Attention-Based Fusion Method for Target Detection in Complex Scenes'. Together they form a unique fingerprint.

Cite this