TY - GEN
T1 - Automatic head gesture learning and synthesis from prosodic cues
AU - Chu, Stephen M.
AU - Huang, Thomas S.
N1 - Funding Information:
We acknowledge the support from the Army Research Laboratory. This work was also supported in part by National Science Foundation Grant CDA 96-24396.
PY - 2000
Y1 - 2000
N2 - We present a novel approach to automatically learn and synthesize head gestures using prosodic features extracted from acoustic speech signals. A minimum entropy hidden Markov model is employed to learn the 3-D head-motion of a speaker. The result is a generative model that is compact and highly predictive. The model is further exploited to synchronize the head-motion with a set of continuous prosodic observations and gather the correspondence between the two by sharing its state machine. In synthesis, the prosodic features are used as the cue signal to drive the generative model so that 3-D head gestures can be inferred. A tracking algorithm based on the Bézier volume deformation model is implemented to track the head-motion. To evaluate the performance of the proposed system, we compare the true head-motion with the prosody-inferred motion. The prosody to head-motion mapping acquired through learning is subsequently applied to animate a talking head. Very convincing head-gestures are produced when novel prosodic cues of the same speaker are presented.
AB - We present a novel approach to automatically learn and synthesize head gestures using prosodic features extracted from acoustic speech signals. A minimum entropy hidden Markov model is employed to learn the 3-D head-motion of a speaker. The result is a generative model that is compact and highly predictive. The model is further exploited to synchronize the head-motion with a set of continuous prosodic observations and gather the correspondence between the two by sharing its state machine. In synthesis, the prosodic features are used as the cue signal to drive the generative model so that 3-D head gestures can be inferred. A tracking algorithm based on the Bézier volume deformation model is implemented to track the head-motion. To evaluate the performance of the proposed system, we compare the true head-motion with the prosody-inferred motion. The prosody to head-motion mapping acquired through learning is subsequently applied to animate a talking head. Very convincing head-gestures are produced when novel prosodic cues of the same speaker are presented.
UR - http://www.scopus.com/inward/record.url?scp=85009069705&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85009069705&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85009069705
T3 - 6th International Conference on Spoken Language Processing, ICSLP 2000
BT - 6th International Conference on Spoken Language Processing, ICSLP 2000
PB - International Speech Communication Association
T2 - 6th International Conference on Spoken Language Processing, ICSLP 2000
Y2 - 16 October 2000 through 20 October 2000
ER -