TY - GEN
T1 - Mandarin Electro-Laryngeal Speech Enhancement based on Statistical Voice Conversion and Manual Tone Control
AU - Qian, Zhaopeng
AU - Niu, Haijun
AU - Wang, Li
AU - Kobayashi, Kazuhiro
AU - Zhang, Shaochuan
AU - Toda, Tomoki
N1 - Publisher Copyright:
© 2021 APSIPA.
PY - 2021
Y1 - 2021
N2 - Electro-Larynx can help the laryngectomees re-pronounce the voice, while the Electro-Laryngeal (EL) speech has a poor intelligibility and naturalness. Recently, voice conversion (VC) has been applied to enhance the EL speech, which achieves a good result. However, the complicated tone variation rule of continuous Mandarin EL speech takes a new challenge into enhancement of EL speech by VC. In this paper, a novel framework combining manual tone control (MTC) and statistical VC is proposed to enhance the continuous Mandarin EL speech. As statistical VC methods, GMM-based VC and CLDNN-based VC are implemented for the proposed framework. The objective and subj ective evaluations are designed to validate the proposed framework. The experimental results have demonstrated that 1) the combination of MTC and statistical VC yields significant improvements in both naturalness and intelligibility of the enhanced Mandarin EL speech, 2) the word perception error rates of the enhanced Mandarin EL speech is decreased from 11.35% of Mandarin EL speech with MTC to 5.61 % by using statistical VC, and 3) the proposed framework achieves the average tone accuracy of 26.59% higher than that of original continuous Mandarin EL speech.
AB - Electro-Larynx can help the laryngectomees re-pronounce the voice, while the Electro-Laryngeal (EL) speech has a poor intelligibility and naturalness. Recently, voice conversion (VC) has been applied to enhance the EL speech, which achieves a good result. However, the complicated tone variation rule of continuous Mandarin EL speech takes a new challenge into enhancement of EL speech by VC. In this paper, a novel framework combining manual tone control (MTC) and statistical VC is proposed to enhance the continuous Mandarin EL speech. As statistical VC methods, GMM-based VC and CLDNN-based VC are implemented for the proposed framework. The objective and subj ective evaluations are designed to validate the proposed framework. The experimental results have demonstrated that 1) the combination of MTC and statistical VC yields significant improvements in both naturalness and intelligibility of the enhanced Mandarin EL speech, 2) the word perception error rates of the enhanced Mandarin EL speech is decreased from 11.35% of Mandarin EL speech with MTC to 5.61 % by using statistical VC, and 3) the proposed framework achieves the average tone accuracy of 26.59% higher than that of original continuous Mandarin EL speech.
UR - https://www.scopus.com/pages/publications/85126709489
M3 - 会议稿件
AN - SCOPUS:85126709489
T3 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
SP - 546
EP - 552
BT - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Y2 - 14 December 2021 through 17 December 2021
ER -