A procedure for estimating gestural scores from natural speech

Hosung Nam, Vikramjit Mitra, Mark Tiede, Elliot Saltzman, Louis Goldstein, Carol Espy-Wilson, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Speech can be represented as a constellation of constricting events, gestures, which are defined at distinct vocal tract sites, in the form of a gestural score. Gestures and their output trajectories, tract variables, which are available only in synthetic speech, have recently been shown to improve automatic speech recognition (ASR) performance. In this paper we propose an iterative analysis-by-synthesis landmark based time-warping architecture to obtain gestural scores for natural speech. Given an utterance, the Haskins Laboratories Task Dynamics and Application (TADA) model was used to generate its prototype gestural score and the corresponding synthetic acoustic output. An optimal gestural score was estimated through iterative time-warping processes such that the distance between original and TADA-synthesized speech is minimized. We compared the performance of our approach to that of a conventional dynamic time warping procedure using Log-Spectral and Itakura Distance measures. We also performed a word recognition experiment using the gestural annotations to show that the gestural scores are suitable for word recognition.

Original languageEnglish (US)
Title of host publicationProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PublisherInternational Speech Communication Association
Pages30-33
Number of pages4
StatePublished - 2010

Publication series

NameProceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Keywords

  • Articulatory phonology
  • Gestures
  • TADA model
  • Time warping
  • Vocal tract variables
  • X-ray microbeam data

ASJC Scopus subject areas

  • Language and Linguistics
  • Speech and Hearing
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'A procedure for estimating gestural scores from natural speech'. Together they form a unique fingerprint.

Cite this