Kernel metric learning for phonetic classification

Jui Ting Huang, Xi Zhou, Mark Allan Hasegawa-Johnson, Thomas S Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes, attempting to satisfy a large margin constraint: the distance from a segment to its correct label class should be less than the distance to any other phone class by the largest possible margin. Furthermore, an universal background model structure is proposed to give the correspondence between statistical models of phone types and tokens, allowing us to use statistical models of each phone token in a large margin speech recognition framework. Experiments on TIMIT database demonstrated the effectiveness of our framework.

Original languageEnglish (US)
Title of host publicationProceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009
Pages141-145
Number of pages5
DOIs
StatePublished - Dec 1 2009
Externally publishedYes
Event2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009 - Merano, Italy
Duration: Dec 13 2009Dec 17 2009

Publication series

NameProceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009

Other

Other2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009
Country/TerritoryItaly
CityMerano
Period12/13/0912/17/09

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Signal Processing

Fingerprint

Dive into the research topics of 'Kernel metric learning for phonetic classification'. Together they form a unique fingerprint.

Cite this