Automatic head gesture learning and synthesis from prosodic cues

Stephen M. Chu, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a novel approach to automatically learn and synthesize head gestures using prosodic features extracted from acoustic speech signals. A minimum entropy hidden Markov model is employed to learn the 3-D head-motion of a speaker. The result is a generative model that is compact and highly predictive. The model is further exploited to synchronize the head-motion with a set of continuous prosodic observations and gather the correspondence between the two by sharing its state machine. In synthesis, the prosodic features are used as the cue signal to drive the generative model so that 3-D head gestures can be inferred. A tracking algorithm based on the Bézier volume deformation model is implemented to track the head-motion. To evaluate the performance of the proposed system, we compare the true head-motion with the prosody-inferred motion. The prosody to head-motion mapping acquired through learning is subsequently applied to animate a talking head. Very convincing head-gestures are produced when novel prosodic cues of the same speaker are presented.

Original languageEnglish (US)
Title of host publication6th International Conference on Spoken Language Processing, ICSLP 2000
PublisherInternational Speech Communication Association
ISBN (Electronic)7801501144, 9787801501141
StatePublished - 2000
Event6th International Conference on Spoken Language Processing, ICSLP 2000 - Beijing, China
Duration: Oct 16 2000Oct 20 2000

Publication series

Name6th International Conference on Spoken Language Processing, ICSLP 2000

Other

Other6th International Conference on Spoken Language Processing, ICSLP 2000
Country/TerritoryChina
CityBeijing
Period10/16/0010/20/00

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Automatic head gesture learning and synthesis from prosodic cues'. Together they form a unique fingerprint.

Cite this