Skip to main navigation Skip to search Skip to main content

Remote Sensing Image Change Captioning With Dual-Branch Transformers: A New Method and a Large Scale Dataset

  • Beihang University
  • NetEase Fuxi AI Lab

Research output: Contribution to journalArticlepeer-review

Abstract

Analyzing land cover changes with multitemporal remote sensing (RS) images is crucial for environmental protection and land planning. In this article, we explore RS image change captioning (RSICC), a new task aiming at generating human-like language descriptions for the land cover changes in multitemporal RS images. We propose a novel Transformer-based RSICC (RSICCformer) model. It consists of three main components: 1) a CNN-based feature extractor to generate high-level features of RS image pairs; 2) a dual-branch Transformer encoder (DTE) to improve the feature discrimination capacity for the changes; and 3) a caption decoder to generate sentences describing the differences. The DTE consists of a hierarchy of processing stages to capture and recognize multiple changes of interest. Concretely, we use the bitemporal feature differences as keys to enhance image features (queries) from each temporal image in the dual-branch Transformer encoder (DTE). To explore the RSICC task, we build a large-scale dataset named LEVIR-CC, which contains 10077 pairs of bitemporal RS images and 50385 sentences describing the differences between images. We benchmark existing state-of-the-art synthetic image change captioning methods on the LEVIR Change Captioning dataset (LEVIR-CC dataset), and our RSICCformer outperforms previous methods with a significant margin (+4.98% on BLEU-4 and +9.86% on CIDEr-D). The attention visualization results also suggest that our model can focus on changes of interest and ignore irrelevant changes.

Original languageEnglish
Article number5633520
JournalIEEE Transactions on Geoscience and Remote Sensing
Volume60
DOIs
StatePublished - 2022

Keywords

  • Change captioning (CC)
  • Transformer
  • change detection (CD)
  • image captioning
  • remote sensing (RS) images

Fingerprint

Dive into the research topics of 'Remote Sensing Image Change Captioning With Dual-Branch Transformers: A New Method and a Large Scale Dataset'. Together they form a unique fingerprint.

Cite this