TY - JOUR
T1 - Mandarin speech reconstruction from surface electromyography based on generative adversarial networks
AU - Li, Fengji
AU - Shen, Fei
AU - Ma, Ding
AU - Zhou, Jie
AU - Wang, Li
AU - Fan, Fan
AU - Liu, Tao
AU - Chen, Xiaohong
AU - Toda, Tomoki
AU - Niu, Haijun
N1 - Publisher Copyright:
© 2025 The Authors
PY - 2025/6
Y1 - 2025/6
N2 - The loss of speech function due to conditions such as laryngectomy and vocal cord paralysis significantly impacts the quality of life for patients. Achieving effective communication for these patients is a goal pursued by researchers. This study primarily explores a method for reconstructing Mandarin speech based on voice-related neck and facial surface electromyography (sEMG). Neck and facial sEMG signals and speech waveform were synchronously collected during normal speech production. A speech reconstruction model for Mandarin speech, based on multi-scale feature extraction from EMG and a generative adversarial network (GAN), was developed. Both subjective and objective evaluations were conducted to assess the speech reconstruction performance of the model. The evaluation results indicate that the model effectively reconstructs speech from neck and facial sEMG signals. The reconstructed speech closely matches the original in terms of spectrogram and fundamental frequency, with mel-cepstrum distortion of 8.45 dB, log F0 RMSE of 0.40, F0 correlation coefficient of 0.71 and F0 voiced/unvoiced estimation accuracy of 0.80. The character error rate of the reconstructed speech is 0.32, while the tone error rate is 0.26. The subjective listening test results show that the naturalness of the reconstructed speech is acceptable, with a mean opinion score greater than 3. This study demonstrates the potential of deep learning techniques in effectively reconstructing Mandarin speech from sEMG.
AB - The loss of speech function due to conditions such as laryngectomy and vocal cord paralysis significantly impacts the quality of life for patients. Achieving effective communication for these patients is a goal pursued by researchers. This study primarily explores a method for reconstructing Mandarin speech based on voice-related neck and facial surface electromyography (sEMG). Neck and facial sEMG signals and speech waveform were synchronously collected during normal speech production. A speech reconstruction model for Mandarin speech, based on multi-scale feature extraction from EMG and a generative adversarial network (GAN), was developed. Both subjective and objective evaluations were conducted to assess the speech reconstruction performance of the model. The evaluation results indicate that the model effectively reconstructs speech from neck and facial sEMG signals. The reconstructed speech closely matches the original in terms of spectrogram and fundamental frequency, with mel-cepstrum distortion of 8.45 dB, log F0 RMSE of 0.40, F0 correlation coefficient of 0.71 and F0 voiced/unvoiced estimation accuracy of 0.80. The character error rate of the reconstructed speech is 0.32, while the tone error rate is 0.26. The subjective listening test results show that the naturalness of the reconstructed speech is acceptable, with a mean opinion score greater than 3. This study demonstrates the potential of deep learning techniques in effectively reconstructing Mandarin speech from sEMG.
KW - Generative adversarial networks
KW - Mandarin speech
KW - Surface electromyography
KW - speech reconstruction
UR - https://www.scopus.com/pages/publications/86000499663
U2 - 10.1016/j.medntd.2025.100359
DO - 10.1016/j.medntd.2025.100359
M3 - 文章
AN - SCOPUS:86000499663
SN - 2590-0935
VL - 26
JO - Medicine in Novel Technology and Devices
JF - Medicine in Novel Technology and Devices
M1 - 100359
ER -