TY - JOUR
T1 - Real-time speech-driven face animation with expressions using neural networks
AU - Hong, Pengyu
AU - Wen, Zhen
AU - Huang, Thomas S.
N1 - Funding Information:
Manuscript received April 23, 2001; revised November 15, 2001. This work was supported in part by the U.S. Army Research Laboratory under Cooperative Agreement DAAL01-96-2-0003, and in part by the National Science Foundation under Grant IIS-00-85980.
PY - 2002/7
Y1 - 2002/7
N2 - A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, automatic facial motion analysis, and real-time speech-driven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the motion units (MUs). An facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MUPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a real-time audio-to-MUP mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speech-driven face animation with expressions for the iFACE system. Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception.
AB - A real-time speech-driven synthetic talking face provides an effective multimodal communication interface in distributed collaboration environments. Nonverbal gestures such as facial expressions are important to human communication and should be considered by speech-driven face animation systems. In this paper, we present a framework that systematically addresses facial deformation modeling, automatic facial motion analysis, and real-time speech-driven face animation with expression using neural networks. Based on this framework, we learn a quantitative visual representation of the facial deformations, called the motion units (MUs). An facial deformation can be approximated by a linear combination of the MUs weighted by MU parameters (MUPs). We develop an MU-based facial motion tracking algorithm which is used to collect an audio-visual training database. Then, we construct a real-time audio-to-MUP mapping by training a set of neural networks using the collected audio-visual training database. The quantitative evaluation of the mapping shows the effectiveness of the proposed approach. Using the proposed method, we develop the functionality of real-time speech-driven face animation with expressions for the iFACE system. Experimental results show that the synthetic expressive talking face of the iFACE system is comparable with a real face in terms of the effectiveness of their influences on bimodal human emotion perception.
KW - Facial deformation modeling
KW - Facial motion analysis and synthesis
KW - Neural networks
KW - Real-time speech-driven talking face with expressions
UR - http://www.scopus.com/inward/record.url?scp=0036650837&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0036650837&partnerID=8YFLogxK
U2 - 10.1109/TNN.2002.1021892
DO - 10.1109/TNN.2002.1021892
M3 - Article
C2 - 18244487
AN - SCOPUS:0036650837
SN - 1045-9227
VL - 13
SP - 916
EP - 927
JO - IEEE Transactions on Neural Networks
JF - IEEE Transactions on Neural Networks
IS - 4
ER -