TY - GEN
T1 - Expression Fusion to Enhance Video and Speech-Driven 3D Facial Animation
AU - Liu, Yangyue
AU - Hu, Yong
AU - Shen, Xukun
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
PY - 2025
Y1 - 2025
N2 - Currently, monocular 3D face capture and tracking methods are difficult to ensure the accuracy of identity and expression, and speech-driven 3D facial animation methods are challenging to acquire head pose and upper facial expression. In response, we propose a video- and speech-driven 3D facial animation synthesis method that attempts to combine the benefits of these methods and avoid their weaknesses. Specifically, to facilitate animation and handle the problem of character identity accuracy, we generate character appearance templates by registering a 3D morphable model (3DMM) to a rigid model. To address the limitations of different methods for acquiring expressions and poses, we design an expression fusion network based on the 3DMM space to fuse the expression data acquired by different modalities and output unified facial expression data. Finally, due to the limitation of the dataset ignoring the eye movements, we design an eye movement enhancement network to add eyelid movements by modifying the facial expression data and then replacing the eye region to get the final 3D face mesh animation. Through detailed experiments, we demonstrate that our method can generate speech-visual synchronized 3D face animations while obtaining better performance results than the current concerns of using monocular video images or speech-driven generation of 3D face animation methods independently.
AB - Currently, monocular 3D face capture and tracking methods are difficult to ensure the accuracy of identity and expression, and speech-driven 3D facial animation methods are challenging to acquire head pose and upper facial expression. In response, we propose a video- and speech-driven 3D facial animation synthesis method that attempts to combine the benefits of these methods and avoid their weaknesses. Specifically, to facilitate animation and handle the problem of character identity accuracy, we generate character appearance templates by registering a 3D morphable model (3DMM) to a rigid model. To address the limitations of different methods for acquiring expressions and poses, we design an expression fusion network based on the 3DMM space to fuse the expression data acquired by different modalities and output unified facial expression data. Finally, due to the limitation of the dataset ignoring the eye movements, we design an eye movement enhancement network to add eyelid movements by modifying the facial expression data and then replacing the eye region to get the final 3D face mesh animation. Through detailed experiments, we demonstrate that our method can generate speech-visual synchronized 3D face animations while obtaining better performance results than the current concerns of using monocular video images or speech-driven generation of 3D face animation methods independently.
KW - 3D facial animation
KW - 3D talking-head generation
KW - Facial reenactment
KW - Multimodal expression fusion
UR - https://www.scopus.com/pages/publications/105000683976
U2 - 10.1007/978-3-031-82021-2_17
DO - 10.1007/978-3-031-82021-2_17
M3 - 会议稿件
AN - SCOPUS:105000683976
SN - 9783031820205
T3 - Lecture Notes in Computer Science
SP - 245
EP - 257
BT - Advances in Computer Graphics - 41st Computer Graphics International Conference, CGI 2024, Proceedings
A2 - Magnenat-Thalmann, Nadia
A2 - Kim, Jinman
A2 - Sheng, Bin
A2 - Deng, Zhigang
A2 - Thalmann, Daniel
A2 - Li, Ping
PB - Springer Science and Business Media Deutschland GmbH
T2 - 41st Computer Graphics International Conference, CGI 2024
Y2 - 1 July 2024 through 5 July 2024
ER -