Abstract
Convolutional neural networks (CNNs) and transformers have achieved great success in hyperspectral image (HSI) classification. However, CNNs are inefficient in establishing long-range dependencies, and transformers may overlook some local information. To overcome these limitations, we propose a U-shaped convolution-aided transformer (UCaT) that incorporates convolutions into a novel transformer architecture to aid classification. The group convolution is employed as parallel local descriptors to extract detailed features, and then the multi-head self-attention recalibrates these features in consistent groups, emphasizing informative features while maintaining the inherent spectral–spatial data structure. Specifically, three components are constructed using particular strategies. First, the spectral groupwise self-attention (spectral-GSA) component is developed for spectral attention, which selectively emphasizes diagnostic spectral features among neighboring bands and reduces the spectral dimension. Then, the spatial dual-scale convolution-aided self-attention (spatial-DCSA) encoder and spatial convolution-aided cross-attention (spatial-CCA) decoder form a U-shaped architecture for per-pixel classifications over HSI patches, where the encoder utilizes a dual-scale strategy to explore information in different scales and the decoder adopts the cross-attention for information fusion. Experimental results on three datasets demonstrate that the proposed UCaT outperforms the competitors. Additionally, a visual explanation of the UCaT is given, showing its ability to build global interactions and capture pixel-level dependencies.
| Original language | English |
|---|---|
| Article number | 288 |
| Journal | Remote Sensing |
| Volume | 16 |
| Issue number | 2 |
| DOIs | |
| State | Published - Jan 2024 |
Keywords
- convolutional neural networks
- hyperspectral image classification
- spatial attention
- spectral attention
- transformers
Fingerprint
Dive into the research topics of 'A U-Shaped Convolution-Aided Transformer with Double Attention for Hyperspectral Image Classification'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver