An improved noise-robust voice activity detector based on hidden semi-Markov models

Research output: Contribution to journalArticlepeer-review

Abstract

To improve the performance of voice activity detector (VAD) in noisy environments, this paper concentrates on three critical aspects related to noise robustness including speech features, feature distributions and temporal dependence. Based on the statistic on TIMIT and NOIZEUS, Mel-frequency cepstrum coefficients (MFCCs) are selected as speech features, Gaussian Mixture distributions (GMD) are applied to associate the observations in MFCC domain with both speech and non-speech states, and Weibull and Gamma distributions are used to explicitly model noise and speech durations, respectively. To integrate these aspects into VAD, the hidden semi-Markov model (HSMM) as a generalized hidden Markov model (HMM) is introduced first. Then the VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge and modified forward variables of HSMM. We design a recursive way to efficiently calculate modified forward variables. Finally a series of experiments demonstrate: (1) the positive effect of different robustness-related schemes adopted in the proposed VAD; (2) better performance against the standard ITU-T G.729B, Adaptive MultiRate VAD phase 2 (AMR2), Advanced Front-end (AFE), HMM-based VAD and VAD using Laplacian-Gaussian model (LD-GD based VAD).

Original languageEnglish
Pages (from-to)1044-1053
Number of pages10
JournalPattern Recognition Letters
Volume32
Issue number7
DOIs
StatePublished - 1 May 2011

Keywords

  • Forward variable
  • Hidden semi-Markov model
  • Likelihood ratio test
  • Observation distribution
  • State duration
  • Voice activity detection

Fingerprint

Dive into the research topics of 'An improved noise-robust voice activity detector based on hidden semi-Markov models'. Together they form a unique fingerprint.

Cite this