TY - GEN
T1 - Tongue Model-Driven Method Based on Fully Connected Neural Network
AU - Zhang, Shaochuan
AU - Li, Fengji
AU - Wang, Li
AU - Zhou, Jie
AU - Niu, Haijun
N1 - Publisher Copyright:
©2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Ultrasound technology, capable of capturing the tongue's contour, has received significant attention in speech visualization. There is an increasing interest in utilizing tongue motion data from ultrasound imaging to drive 3D tongue models. However, traditional driving methods have not fully utilized all the contour information of tongue, typically using a limited number of contour points to drive the tongue model. These approaches often lead to pathological shapes that deviate from natural speech articulation. To address this issue, we propose an innovative method that drives the tongue model by utilizing the entire tongue contour captured from ultrasound images. Initially, the complete tongue contour is extracted from the ultrasound images. Subsequently, a mapping model is developed to establish the relationship between ultrasound tongue contour and model control parameters. Finally, Root mean squared error is used to evaluate the reconstructed model control parameters, and the curve similarity index is used to assess the resemblance between the ultrasound tongue contour and the model midsagittal shape. This evaluation determines the accuracy of the driven tongue model. The results demonstrate that the reconstruction error of the control parameters is within 3%, and the average contour curve similarity between the tongue model of each phoneme and the ultrasound tongue contour is approximately 95%. These findings indicate the feasibility of driving tongue models using the entire tongue contour, effectively generating 3D tongue models that match 2D ultrasound images and avoiding issues with pathological shapes in the driven tongue model.
AB - Ultrasound technology, capable of capturing the tongue's contour, has received significant attention in speech visualization. There is an increasing interest in utilizing tongue motion data from ultrasound imaging to drive 3D tongue models. However, traditional driving methods have not fully utilized all the contour information of tongue, typically using a limited number of contour points to drive the tongue model. These approaches often lead to pathological shapes that deviate from natural speech articulation. To address this issue, we propose an innovative method that drives the tongue model by utilizing the entire tongue contour captured from ultrasound images. Initially, the complete tongue contour is extracted from the ultrasound images. Subsequently, a mapping model is developed to establish the relationship between ultrasound tongue contour and model control parameters. Finally, Root mean squared error is used to evaluate the reconstructed model control parameters, and the curve similarity index is used to assess the resemblance between the ultrasound tongue contour and the model midsagittal shape. This evaluation determines the accuracy of the driven tongue model. The results demonstrate that the reconstruction error of the control parameters is within 3%, and the average contour curve similarity between the tongue model of each phoneme and the ultrasound tongue contour is approximately 95%. These findings indicate the feasibility of driving tongue models using the entire tongue contour, effectively generating 3D tongue models that match 2D ultrasound images and avoiding issues with pathological shapes in the driven tongue model.
KW - speech visualization
KW - tongue contour extraction
KW - tongue model
KW - tongue model control
KW - ultrasound imaging
UR - https://www.scopus.com/pages/publications/85216392823
U2 - 10.1109/ISCSLP63861.2024.10800371
DO - 10.1109/ISCSLP63861.2024.10800371
M3 - 会议稿件
AN - SCOPUS:85216392823
T3 - 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
SP - 121
EP - 125
BT - 2024 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
A2 - Qian, Yanmin
A2 - Jin, Qin
A2 - Ou, Zhijian
A2 - Ling, Zhenhua
A2 - Wu, Zhiyong
A2 - Li, Ya
A2 - Xie, Lei
A2 - Tao, Jianhua
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 14th International Symposium on Chinese Spoken Language Processing, ISCSLP 2024
Y2 - 7 November 2024 through 10 November 2024
ER -