TY - GEN
T1 - Kernel metric learning for phonetic classification
AU - Huang, Jui Ting
AU - Zhou, Xi
AU - Hasegawa-Johnson, Mark
AU - Huang, Thomas
PY - 2009
Y1 - 2009
N2 - While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes, attempting to satisfy a large margin constraint: the distance from a segment to its correct label class should be less than the distance to any other phone class by the largest possible margin. Furthermore, an universal background model structure is proposed to give the correspondence between statistical models of phone types and tokens, allowing us to use statistical models of each phone token in a large margin speech recognition framework. Experiments on TIMIT database demonstrated the effectiveness of our framework.
AB - While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes, attempting to satisfy a large margin constraint: the distance from a segment to its correct label class should be less than the distance to any other phone class by the largest possible margin. Furthermore, an universal background model structure is proposed to give the correspondence between statistical models of phone types and tokens, allowing us to use statistical models of each phone token in a large margin speech recognition framework. Experiments on TIMIT database demonstrated the effectiveness of our framework.
UR - http://www.scopus.com/inward/record.url?scp=77949383728&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77949383728&partnerID=8YFLogxK
U2 - 10.1109/ASRU.2009.5373389
DO - 10.1109/ASRU.2009.5373389
M3 - Conference contribution
AN - SCOPUS:77949383728
SN - 9781424454792
T3 - Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009
SP - 141
EP - 145
BT - Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009
T2 - 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009
Y2 - 13 December 2009 through 17 December 2009
ER -