RailFusion: A Lidar-Camera Data Interaction Network for 3-D Railway Object Detection

Research output: Contribution to journalArticlepeer-review

Abstract

Accurate detection of 3D objects is vital to the perception of the railway environments for the safe operation autonomous trains, particularly given the complexity of railway environments and the challenges in detecting objects of variable sizes and of distant objects. This study introduces RailFusion, a LiDAR-Camera fusion network for integrating multi-modal features. RailFusion consists of two main modules: Cross-Domain Feature Extraction (CDFE) and Multi-Modal Fusion (MMF). Specifically, the CDFE module is designed with a novel feature extraction method to enhance cross-domain features interaction by utilizing LiDAR spatial depth and image semantic information. The MMF module uses deformable attention for aligning and fusing multi-modal features. Further to this, the channel normalization fusion is proposed to assign channel weights. Experimental results show that the mean average-precision (mAP) of our proposed RailFusion is 57.2%, which is 8.4% higher than the baseline 3D object detection network BEVFusion. Moreover, the results show that RailFusion is applicable to long-range detection as well as for detecting varying sized and short-range objects. All these indicate that RailFusion has the potential to be readily applicable in 3D object detection in railway environments.

Original languageEnglish
Pages (from-to)12761-12773
Number of pages13
JournalIEEE Transactions on Intelligent Transportation Systems
Volume26
Issue number8
DOIs
StatePublished - 2025

Keywords

  • BEV representation
  • Data fusion
  • LiDAR
  • object detection
  • railway safety

Fingerprint

Dive into the research topics of 'RailFusion: A Lidar-Camera Data Interaction Network for 3-D Railway Object Detection'. Together they form a unique fingerprint.

Cite this