Skip to main navigation Skip to search Skip to main content

A complex Morlet- convolutional attention network framework for robust direction-of-arrival estimation

  • Zhanying Hou
  • , Weiqing Xu*
  • , Jun Zheng
  • , Hanxu Zhang
  • , Guanwei Jia
  • , Maolin Cai
  • *Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Sound source localization using deep learning presents great potential, but its widespread application is often hindered by the limited availability of real-world labeled data needed to train high-capacity models. This work introduces a deep learning-based framework designed to address this data scarcity challenge in indoor direction-of-arrival (DoA) estimation tasks. Specifically, complex Morlet wavelet transforms are used to generate high-resolution time-frequency representations from multichannel microphone array signals, capturing both temporal and spectral information, including crucial phase cues. These representations are then fed into a hybrid CoAtNet architecture that combines convolutional layers with self-attention mechanisms to enable effective local feature extraction and global spatial context modeling. To mitigate the dependence on extensive real-world datasets, a two-stage training strategy is adopted: large-scale synthetic data generated via Pyroomacoustics is used for pretraining, followed by fine-tuning on a small set of real-world samples for domain adaptation. Experimental results demonstrate that the proposed system achieves 98.22 % accuracy on real recordings and 95.21 % on the SLoClas benchmark dataset, outperforming baseline deep learning models. The proposed framework offers a practical and efficient solution for sound source localization in real-world applications where labeled data is limited.

Original languageEnglish
Article number105683
JournalDigital Signal Processing: A Review Journal
Volume168
DOIs
StatePublished - Jan 2026

Keywords

  • Coatnet model
  • Complex morlet wavelet transform
  • Direction of arrival
  • Microphone arrays

Fingerprint

Dive into the research topics of 'A complex Morlet- convolutional attention network framework for robust direction-of-arrival estimation'. Together they form a unique fingerprint.

Cite this