TY - GEN
T1 - A robust voice activity detector based on weibull and Gaussian mixture distribution
AU - Liang, Yuan
AU - Liu, Xianglong
AU - Zhou, Mi
AU - Lou, Yihua
AU - Shan, Baosong
PY - 2010
Y1 - 2010
N2 - In this paper, we focus on the observation and state duration distributions in hidden semi-Markov model (HSMM)-based voice activity detection. To perform robustly in noisy environment, firstly, acoustic features of noisy speech are extracted by Mel-frequency cepstrum processor after filtering the raw speech with a modified Wiener filter. According to the statistic on TIMIT database, we use Gaussian Mixture distributions (GMD) for both speech and non-speech state to correlate the MFCC feature vectors and state sequences. The transition probability in HSMM is not a constant like in HMM but depends on the elapsed time in last state, and is modeled by Weibull distribution (WD) in this paper. The final VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge. Also a adaptive threshold is used to achieve better detection results. Experiments on noisy speech data show that the proposed method performs more robustly and accurately than the standard ITU-T G.729B, AMR2, HMM-based VAD and VAD using Laplacian-Gaussian model.
AB - In this paper, we focus on the observation and state duration distributions in hidden semi-Markov model (HSMM)-based voice activity detection. To perform robustly in noisy environment, firstly, acoustic features of noisy speech are extracted by Mel-frequency cepstrum processor after filtering the raw speech with a modified Wiener filter. According to the statistic on TIMIT database, we use Gaussian Mixture distributions (GMD) for both speech and non-speech state to correlate the MFCC feature vectors and state sequences. The transition probability in HSMM is not a constant like in HMM but depends on the elapsed time in last state, and is modeled by Weibull distribution (WD) in this paper. The final VAD decision is made according to the likelihood ratio test (LRT) incorporating state prior knowledge. Also a adaptive threshold is used to achieve better detection results. Experiments on noisy speech data show that the proposed method performs more robustly and accurately than the standard ITU-T G.729B, AMR2, HMM-based VAD and VAD using Laplacian-Gaussian model.
KW - Gaussian Mixture Distribution
KW - Voice activity detection
KW - Weibull distribution
UR - https://www.scopus.com/pages/publications/77957274583
U2 - 10.1109/ICSPS.2010.5555230
DO - 10.1109/ICSPS.2010.5555230
M3 - 会议稿件
AN - SCOPUS:77957274583
SN - 9781424468911
T3 - ICSPS 2010 - Proceedings of the 2010 2nd International Conference on Signal Processing Systems
SP - V226-V230
BT - ICSPS 2010 - Proceedings of the 2010 2nd International Conference on Signal Processing Systems
T2 - 2010 2nd International Conference on Signal Processing Systems, ICSPS 2010
Y2 - 5 July 2010 through 7 July 2010
ER -