TY - GEN
T1 - Phonetic landmark detection for automatic language identification
AU - Harwath, David
AU - Hasegawa-Johnson, Mark
N1 - Publisher Copyright:
© 2010 Proceedings of the International Conference on Speech Prosody.
PY - 2010
Y1 - 2010
N2 - This paper presents a method of augmenting shifted-delta cepstral coefficients (SDCCs) with the classification outputs of an array of support vector machines (SVMs) trained to detect a set of manner and place features on telephone speech. The SVM array allows for broad phoneme classification, and when this information is concatenated with SDCCs to form a hybrid feature vector for each acoustic frame, a set of Gaussian mixture models (GMMs) may be trained to perform automatic language identification (LID). The NTIMIT telephone band speech corpus was used to train the SVM-based distinctive feature recognizers, while the NIST callfriend telephone corpus was used for training and testing the rest of the system.
AB - This paper presents a method of augmenting shifted-delta cepstral coefficients (SDCCs) with the classification outputs of an array of support vector machines (SVMs) trained to detect a set of manner and place features on telephone speech. The SVM array allows for broad phoneme classification, and when this information is concatenated with SDCCs to form a hybrid feature vector for each acoustic frame, a set of Gaussian mixture models (GMMs) may be trained to perform automatic language identification (LID). The NTIMIT telephone band speech corpus was used to train the SVM-based distinctive feature recognizers, while the NIST callfriend telephone corpus was used for training and testing the rest of the system.
KW - Distinctive Features
KW - Gaussian Mixture Models
KW - Language Identification
KW - Support Vector Machines
UR - http://www.scopus.com/inward/record.url?scp=85089064480&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089064480&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85089064480
T3 - Proceedings of the International Conference on Speech Prosody
BT - 5th International Conference on Speech Prosody 2010
PB - International Speech Communication Association
T2 - 5th International Conference on Speech Prosody: Every Language, Every Style, SP 2010
Y2 - 10 May 2010 through 14 May 2010
ER -