Phonetic landmark detection for automatic language identification

David Harwath, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper presents a method of augmenting shifted-delta cepstral coefficients (SDCCs) with the classification outputs of an array of support vector machines (SVMs) trained to detect a set of manner and place features on telephone speech. The SVM array allows for broad phoneme classification, and when this information is concatenated with SDCCs to form a hybrid feature vector for each acoustic frame, a set of Gaussian mixture models (GMMs) may be trained to perform automatic language identification (LID). The NTIMIT telephone band speech corpus was used to train the SVM-based distinctive feature recognizers, while the NIST callfriend telephone corpus was used for training and testing the rest of the system.

Original languageEnglish (US)
Title of host publication5th International Conference on Speech Prosody 2010
PublisherInternational Speech Communication Association
ISBN (Electronic)9780000000002
StatePublished - 2010
Event5th International Conference on Speech Prosody: Every Language, Every Style, SP 2010 - Chicago, United States
Duration: May 10 2010May 14 2010

Publication series

NameProceedings of the International Conference on Speech Prosody
ISSN (Print)2333-2042

Conference

Conference5th International Conference on Speech Prosody: Every Language, Every Style, SP 2010
Country/TerritoryUnited States
CityChicago
Period5/10/105/14/10

Keywords

  • Distinctive Features
  • Gaussian Mixture Models
  • Language Identification
  • Support Vector Machines

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Phonetic landmark detection for automatic language identification'. Together they form a unique fingerprint.

Cite this