TY - JOUR
T1 - RSMamba
T2 - Remote Sensing Image Classification with State Space Model
AU - Chen, Keyan
AU - Chen, Bowen
AU - Liu, Chenyang
AU - Li, Wenyuan
AU - Zou, Zhengxia
AU - Shi, Zhenwei
N1 - Publisher Copyright:
© 2004-2012 IEEE.
PY - 2024
Y1 - 2024
N2 - Remote sensing image classification forms the foundation of various understanding tasks, serving a crucial function in remote sensing image interpretation. The recent advancements of convolutional neural networks (CNNs) and transformers have markedly enhanced classification accuracy. Nonetheless, remote sensing scene classification remains a significant challenge, especially given the complexity and diversity of remote sensing scenarios and the variability of spatiotemporal resolutions. The capacity for whole-image understanding can provide more precise semantic cues for scene discrimination. In this letter, we introduce RSMamba, a novel architecture for remote sensing image classification. RSMamba is based on the state space model (SSM) and incorporates an efficient, hardware-aware design known as the Mamba. It integrates the advantages of both a global receptive field and linear modeling complexity. To overcome the limitation of the vanilla Mamba, which can only model causal sequences and is not adaptable to 2-D image data, we propose a dynamic multipath activation mechanism to augment Mamba's capacity to model noncausal data. Notably, RSMamba maintains the inherent modeling mechanism of the vanilla Mamba, yet exhibits superior performance across multiple remote sensing image classification datasets, e.g., F1 scores of 95.25, 92.63, and 95.18 on the UC Merced, AID, and RESISC45 classification datasets, respectively, exceeding those of concurrent Vim and VMamba. This indicates that RSMamba holds significant potential to function as the backbone of future visual foundation models. The code is available at https://github.com/KyanChen/RSMamba.
AB - Remote sensing image classification forms the foundation of various understanding tasks, serving a crucial function in remote sensing image interpretation. The recent advancements of convolutional neural networks (CNNs) and transformers have markedly enhanced classification accuracy. Nonetheless, remote sensing scene classification remains a significant challenge, especially given the complexity and diversity of remote sensing scenarios and the variability of spatiotemporal resolutions. The capacity for whole-image understanding can provide more precise semantic cues for scene discrimination. In this letter, we introduce RSMamba, a novel architecture for remote sensing image classification. RSMamba is based on the state space model (SSM) and incorporates an efficient, hardware-aware design known as the Mamba. It integrates the advantages of both a global receptive field and linear modeling complexity. To overcome the limitation of the vanilla Mamba, which can only model causal sequences and is not adaptable to 2-D image data, we propose a dynamic multipath activation mechanism to augment Mamba's capacity to model noncausal data. Notably, RSMamba maintains the inherent modeling mechanism of the vanilla Mamba, yet exhibits superior performance across multiple remote sensing image classification datasets, e.g., F1 scores of 95.25, 92.63, and 95.18 on the UC Merced, AID, and RESISC45 classification datasets, respectively, exceeding those of concurrent Vim and VMamba. This indicates that RSMamba holds significant potential to function as the backbone of future visual foundation models. The code is available at https://github.com/KyanChen/RSMamba.
KW - Backbone network
KW - Mamba
KW - foundation model
KW - image classification
KW - remote sensing images
UR - https://www.scopus.com/pages/publications/85194894987
U2 - 10.1109/LGRS.2024.3407111
DO - 10.1109/LGRS.2024.3407111
M3 - 文章
AN - SCOPUS:85194894987
SN - 1545-598X
VL - 21
SP - 1
EP - 5
JO - IEEE Geoscience and Remote Sensing Letters
JF - IEEE Geoscience and Remote Sensing Letters
M1 - 8002605
ER -