RM2Occ: Re-Projection Multi-Task Multi-Sensor Fusion for Autonomous Driving 3D Object Detection and Occupancy Perception

Research output: Contribution to journalArticlepeer-review

Abstract

Occupancy prediction plays a crucial role in supporting autonomous driving planning and decision-making. Existing methods typically rely on modular stacking and fusion techniques of object detection, semantic segmentation, and depth estimation to achieve 3D occupancy. However, they fail to deeply explore the transformation relationships between 2D and 3D spaces and to efficiently fuse the different characteristics of multi-source sensors. We propose R M2Occ, the first 3D occupancy perception network that integrates multi-sensor fusion based on different sensor principles and achieves multi-task learning. To leverage the rich 2D semantic information captured by cameras and elevate it to the 3D domain, we begin by querying and populating predefined empty voxels with multi-view image features. Subsequently, we progressively fuse 3D LiDAR point clouds with these populated voxels through an unbalanced fusion strategy that effectively supplements missing information and suppresses noise. Leveraging IMU data and calibration parameters, we then re-project the enriched voxels back onto the 2D image plane according to camera coordinates, performing a secondary query using the semantic segmentation results to recover semantic details potentially lost due to radar fusion limitations and incomplete voxel querying. Finally, supported by a multi-task detection head, R M2Occ simultaneously accomplishes 3D object detection, semantic segmentation, Bird’s Eye View (BEV) detection, and full-scene grid occupancy prediction, enabling comprehensive multi-task output. Extensive experiments and ablation studies on the nuScenes dataset demonstrate that R M2Occ significantly outperforms existing state-of-the-art methods, establishing a new paradigm for accurate and efficient multi-sensor fusion and multi-task perception in autonomous driving scenarios.

Original languageEnglish
Pages (from-to)20864-20881
Number of pages18
JournalIEEE Transactions on Intelligent Transportation Systems
Volume26
Issue number11
DOIs
StatePublished - Nov 2025

Keywords

  • Autonomous driving
  • multi-sensor fusion
  • multi-task
  • occupancy
  • re-projection
  • semantic segmentation

Fingerprint

Dive into the research topics of 'RM2Occ: Re-Projection Multi-Task Multi-Sensor Fusion for Autonomous Driving 3D Object Detection and Occupancy Perception'. Together they form a unique fingerprint.

Cite this