Abstract
Perception of 3-D remote sensing scenes plays a crucial role in accurately recognizing and locating ground objects, as it enables a deeper understanding of complex environments by capturing scene geometry, object relationships, and occlusion patterns. Inspired by the powerful multisensor fusion capabilities in autonomous driving, we explore a new task in this article: given a set of multiview images of a 3-D remote sensing scene, we aim to obtain bird's-eye-view (BEV) scene information under the common view area in the world coordinate system. In this work, we focus on the task of semantic segmentation to demonstrate the feasibility of our approach and introduce a BEV modeling technique tailored for remote sensing scenes, which facilitates the projection of 3-D scene details from multiple perspective views onto a BEV. We then utilize a dual-encoder structure based on the vision transformer (VIT) architecture to extract relevant spatial information using self-attention mechanisms. Within the decoder, we employ a feature pyramid network (FPN) to integrate BEV patch encoding with spatial feature residuals, enabling fine-grained segmentation results at the original input resolution. Furthermore, we curated the LEVIR-MDS multidrone segmentation dataset, comprising scenes from ten community-level areas across three continents, totaling 243k images and their corresponding annotated BEV semantic maps, amounting to approximately 500 GB. This dataset serves as a robust benchmark to assess the effectiveness and generalization capability of our proposed method. To our knowledge, this is the first semantic segmentation dataset designed specifically for collaborative multidrone applications. We further show that our method achieves a 12% improvement in mean IoU (mIoU), reaching 69.73%, compared to a pure convolutional network model.
| Original language | English |
|---|---|
| Article number | 5649414 |
| Journal | IEEE Transactions on Geoscience and Remote Sensing |
| Volume | 62 |
| DOIs | |
| State | Published - 2024 |
Keywords
- Bird's-eye-view (BEV) representation
- multiview collaborative segmentation
- remote sensing
- semantic segmentation
Fingerprint
Dive into the research topics of 'RSBEV: Multiview Collaborative Segmentation of 3-D Remote Sensing Scenes With Bird's-Eye-View Representation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver