Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar

Hao Tang, Yuxiao Hu, Yun Fu, Mark Allan Hasegawa-Johnson, Thomas S Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we propose a complete pipeline of efficient and lowcost techniques to construct a realistic 3D text-driven emotive audio-visual avatar from a single 2D frontal-view face image of any person on the fly. This real-time conversion is achieved through three steps. First, a personalized 3D face model is built based on the 2D face image using a fully automatic 3D face shape and texture reconstruction framework. Second, using standard MPEG-4 FAPs (Facial Animation Parameters), the face model is animated by the viseme and expression channels and is complemented by the visual prosody channel that controls head, eye and eyelid movements. Finally, the facial animation is combined and synchronized with the emotive synthetic speech generated by incorporating an emotion transformer into a Festival-MBROLA text to neutral speech synthesizer.

Original languageEnglish (US)
Title of host publication2008 IEEE International Conference on Multimedia and Expo, ICME 2008 - Proceedings
Pages1205-1208
Number of pages4
DOIs
StatePublished - 2008
Event2008 IEEE International Conference on Multimedia and Expo, ICME 2008 - Hannover, Germany
Duration: Jun 23 2008Jun 26 2008

Other

Other2008 IEEE International Conference on Multimedia and Expo, ICME 2008
CountryGermany
CityHannover
Period6/23/086/26/08

Keywords

  • 3D face reconstruction
  • Facial animation
  • MPEG-4
  • Text-to-speech
  • Viseme

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Electrical and Electronic Engineering

Fingerprint Dive into the research topics of 'Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar'. Together they form a unique fingerprint.

  • Cite this

    Tang, H., Hu, Y., Fu, Y., Hasegawa-Johnson, M. A., & Huang, T. S. (2008). Real-time conversion from a single 2D face image to a 3D text-driven emotive audio-visual avatar. In 2008 IEEE International Conference on Multimedia and Expo, ICME 2008 - Proceedings (pp. 1205-1208). [4607657] https://doi.org/10.1109/ICME.2008.4607657