跳到主要导航 跳到搜索 跳到主要内容

RSCaMa: Remote Sensing Image Change Captioning With State Space Model

  • Beihang University
  • Shanghai Artificial Intelligence Laboratory

科研成果: 期刊稿件文章同行评审

摘要

Remote sensing image change captioning (RSICC) aims to describe surface changes between multitemporal remote sensing images in language, including the changed object categories, locations, and dynamics of changing objects (e.g., added or disappeared). This poses challenges to spatial and temporal modeling of bi-temporal features. Despite previous methods progressing in the spatial change perception, there are still weaknesses in joint spatial-temporal modeling. To address this, in this letter, we propose a novel RSCaMa model, which achieves efficient joint spatial-temporal modeling through multiple CaMa layers, enabling iterative refinement of bi-temporal features. To achieve efficient spatial modeling, we introduce the recently popular Mamba [a state space model (SSM)] with a global receptive field and linear complexity into the RSICC task and propose the Spatial Difference-aware SSM (SD-SSM), overcoming limitations of previous convolutional neural network (CNN)- and Transformer-based methods in the receptive field and computational complexity. SD-SSM enhances the model's ability to capture spatial changes sharply. In terms of efficient temporal modeling, considering the potential correlation between the temporal scanning characteristics of Mamba and the temporality of the RSICC, we propose the Temporal-Traversing SSM (TT-SSM), which scans bi-temporal features in a temporal crosswise manner, enhancing the model's temporal understanding and information interaction. Experiments validate the effectiveness of the efficient joint spatial-temporal modeling and demonstrate the outstanding performance of RSCaMa and the potential of the Mamba in the RSICC task. Additionally, we systematically compare three different language decoders, including Mamba, generative pre-trained Transformer (GPT)-style decoder, and Transformer decoder, providing valuable insights for future RSICC research. The code will be available at https://github.com/Chen-Yang-Liu/RSCaMa.

源语言英语
文章编号6010405
页(从-至)1-5
页数5
期刊IEEE Geoscience and Remote Sensing Letters
21
DOI
出版状态已出版 - 2024

指纹

探究 'RSCaMa: Remote Sensing Image Change Captioning With State Space Model' 的科研主题。它们共同构成独一无二的指纹。

引用此