Abstract
Recognizing underwater acoustic targets is challenging due to the complex characteristics of acoustic sources and channels. The limited perspective of information further complicates this task. This study addresses the issue of insufficient accuracy by proposing a novel fusion recognition algorithm CAF-ViT, which maps multiple time-frequency representation features to category outputs. We propose an improved Vision Transformer, 1D-ViT, to enhance self-attention feature extraction from LOFAR, Mel spectrum and wavelet packets, resulting in a 5.1%, 4.7%, and 6.2% increase in category prediction accuracy, respectively. Additionally, a two-stage fusion framework is presented, involving a feature fusion module that fuses feature pairs based on cross-attention mechanism and a decision fusion module that determines the final category prediction based on confidence weighting. Experimental results demonstrate that our method outperforms other comparison algorithms, achieving the best recognition accuracies of 83.7% on the “DeepShip” dataset. Additionally, ablation experiments provide further analysis of the contribution of each module to the overall enhancement of the proposed method.
| Original language | English |
|---|---|
| Article number | 120049 |
| Journal | Ocean Engineering |
| Volume | 318 |
| DOIs | |
| State | Published - 15 Feb 2025 |
Keywords
- Cross-attention
- Feature fusion
- Self-attention
- Underwater acoustic target recognition
- Vision transformer
Fingerprint
Dive into the research topics of 'CAF-ViT: A cross-attention based Transformer network for underwater acoustic target recognition'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver