Automatic speech recognition using probabilistic transcriptions in Swahili, Amharic, and Dinka

Amit Das, Preethi Jyothi, Mark Hasegawa-Johnson

Research output: Contribution to journalConference articlepeer-review

Abstract

In this study, we develop automatic speech recognition systems for three sub-Saharan African languages using probabilistic transcriptions collected from crowd workers who neither speak nor have any familiarity with the African languages. The three African languages in consideration are Swahili, Amharic, and Dinka. There is a language mismatch in this scenario. More specifically, utterances spoken in African languages were transcribed by crowd workers who were mostly native speakers of English. Due to this, such transcriptions are highly prone to label inaccuracies. First, we use a recently introduced technique called mismatched crowdsourcing which processes the raw crowd transcriptions to confusion networks. Next, we adapt both multilingual hidden Markov models (HMM) and deep neural network (DNN) models using the probabilistic transcriptions of the African languages. Finally, we report the results using both deterministic and probabilistic phone error rates (PER). Automatic speech recognition systems developed using this recipe are particularly useful for low resource languages where there is limited access to linguistic resources and/or transcribers in the native language.

Original languageEnglish (US)
Pages (from-to)3524-3528
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
StatePublished - 2016
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: Sep 8 2016Sep 16 2016

Keywords

  • African languages
  • Cross-lingual speech recognition
  • Deep neural networks
  • Mismatched crowdsourcing

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Automatic speech recognition using probabilistic transcriptions in Swahili, Amharic, and Dinka'. Together they form a unique fingerprint.

Cite this