Abstract
Recently, the optical remote-sensing image-captioning task has gradually become a research hotspot because of its application prospects in the military and civil fields. Many different methods along with data sets have been proposed. Among them, the models following the encoder-decoder framework have better performance in many aspects like generating more accurate and flexible sentences. However, almost all these methods are of a single fixed receptive field and could not put enough attention on grabbing the multiscale information, which leads to incomplete image representation. In this letter, we deal with the multiscale problem and propose two multiscale methods named multiscale attention (MSA) method and multifeat attention (MFA) method, to obtain better representations for the captioning task in the remote-sensing field. The MSA method extracts features from different layers and uses the multihead attention mechanism to obtain the context feature, respectively. The MFA method combines the target-level features and the scene-level features by using the target-detection task as the auxiliary task to enrich the context feature. The experimental results demonstrate that both of them perform better with regard to the metrics like BLEU, METEOR, ROUGE_L, and CIDEr than the benchmark method.
| Original language | English |
|---|---|
| Pages (from-to) | 2001-2005 |
| Number of pages | 5 |
| Journal | IEEE Geoscience and Remote Sensing Letters |
| Volume | 18 |
| Issue number | 11 |
| DOIs | |
| State | Published - 1 Nov 2021 |
Keywords
- Remote-sensing image captioning
- attention
- auxiliary task
- multiscale
Fingerprint
Dive into the research topics of 'Multiscale Methods for Optical Remote-Sensing Image Captioning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver