SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, the pure camera-based Bird's-Eye-View (BEV) perception provides a feasible solution for economical autonomous driving. However, the existing BEV-based multiview 3D detectors generally transform all image features into BEV features, without considering the problem that the large proportion of background information may submerge the object information. In this paper, we propose Semantic-Aware BEV Pooling (SA-BEVPool), which can filter out background information according to the semantic segmentation of image features and transform image features into semantic-aware BEV features. Accordingly, we propose BEV-Paste, an effective data augmentation strategy that closely matches with semantic-aware BEV feature. In addition, we design a Multi-Scale Cross-Task (MSCT) head, which combines task-specific and cross-task information to predict depth distribution and semantic segmentation more accurately, further improving the quality of semantic-aware BEV feature. Finally, we integrate the above modules into a novel multi-view 3D object detection framework, namely SA-BEV Experiments on nuScenes show that SA-BEV achieves state-of-the-art performance. Code has been available at https://github.com/mengtan00/SA-BEV.git.

Original languageEnglish
Title of host publicationProceedings - 2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages3325-3334
Number of pages10
ISBN (Electronic)9798350307184
DOIs
StatePublished - 2023
Event2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023 - Paris, France
Duration: 2 Oct 20236 Oct 2023

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
ISSN (Print)1550-5499

Conference

Conference2023 IEEE/CVF International Conference on Computer Vision, ICCV 2023
Country/TerritoryFrance
CityParis
Period2/10/236/10/23

Fingerprint

Dive into the research topics of 'SA-BEV: Generating Semantic-Aware Bird's-Eye-View Feature for Multi-view 3D Object Detection'. Together they form a unique fingerprint.

Cite this