Automatic speech recognition using probabilistic transcriptions in Swahili, Amharic, and Dinka

Amit Das, Preethi Jyothi, Mark Allan Hasegawa-Johnson

Research output: Contribution to journalConference article

Abstract

In this study, we develop automatic speech recognition systems for three sub-Saharan African languages using probabilistic transcriptions collected from crowd workers who neither speak nor have any familiarity with the African languages. The three African languages in consideration are Swahili, Amharic, and Dinka. There is a language mismatch in this scenario. More specifically, utterances spoken in African languages were transcribed by crowd workers who were mostly native speakers of English. Due to this, such transcriptions are highly prone to label inaccuracies. First, we use a recently introduced technique called mismatched crowdsourcing which processes the raw crowd transcriptions to confusion networks. Next, we adapt both multilingual hidden Markov models (HMM) and deep neural network (DNN) models using the probabilistic transcriptions of the African languages. Finally, we report the results using both deterministic and probabilistic phone error rates (PER). Automatic speech recognition systems developed using this recipe are particularly useful for low resource languages where there is limited access to linguistic resources and/or transcribers in the native language.

Original languageEnglish (US)
Pages (from-to)3524-3528
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
StatePublished - Jan 1 2016
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: Sep 8 2016Sep 16 2016

Fingerprint

Automatic Speech Recognition
Transcription
Speech recognition
Hidden Markov models
Linguistics
Labels
Resources
Language
African Languages
Amharic
Neural Network Model
Markov Model
Error Rate
Africa
Crowds
Scenarios

Keywords

  • African languages
  • Cross-lingual speech recognition
  • Deep neural networks
  • Mismatched crowdsourcing

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Automatic speech recognition using probabilistic transcriptions in Swahili, Amharic, and Dinka. / Das, Amit; Jyothi, Preethi; Hasegawa-Johnson, Mark Allan.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 08-12-September-2016, 01.01.2016, p. 3524-3528.

Research output: Contribution to journalConference article

@article{ee17107527e94d99af705161595e1ff5,
title = "Automatic speech recognition using probabilistic transcriptions in Swahili, Amharic, and Dinka",
abstract = "In this study, we develop automatic speech recognition systems for three sub-Saharan African languages using probabilistic transcriptions collected from crowd workers who neither speak nor have any familiarity with the African languages. The three African languages in consideration are Swahili, Amharic, and Dinka. There is a language mismatch in this scenario. More specifically, utterances spoken in African languages were transcribed by crowd workers who were mostly native speakers of English. Due to this, such transcriptions are highly prone to label inaccuracies. First, we use a recently introduced technique called mismatched crowdsourcing which processes the raw crowd transcriptions to confusion networks. Next, we adapt both multilingual hidden Markov models (HMM) and deep neural network (DNN) models using the probabilistic transcriptions of the African languages. Finally, we report the results using both deterministic and probabilistic phone error rates (PER). Automatic speech recognition systems developed using this recipe are particularly useful for low resource languages where there is limited access to linguistic resources and/or transcribers in the native language.",
keywords = "African languages, Cross-lingual speech recognition, Deep neural networks, Mismatched crowdsourcing",
author = "Amit Das and Preethi Jyothi and Hasegawa-Johnson, {Mark Allan}",
year = "2016",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2016-657",
language = "English (US)",
volume = "08-12-September-2016",
pages = "3524--3528",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Automatic speech recognition using probabilistic transcriptions in Swahili, Amharic, and Dinka

AU - Das, Amit

AU - Jyothi, Preethi

AU - Hasegawa-Johnson, Mark Allan

PY - 2016/1/1

Y1 - 2016/1/1

N2 - In this study, we develop automatic speech recognition systems for three sub-Saharan African languages using probabilistic transcriptions collected from crowd workers who neither speak nor have any familiarity with the African languages. The three African languages in consideration are Swahili, Amharic, and Dinka. There is a language mismatch in this scenario. More specifically, utterances spoken in African languages were transcribed by crowd workers who were mostly native speakers of English. Due to this, such transcriptions are highly prone to label inaccuracies. First, we use a recently introduced technique called mismatched crowdsourcing which processes the raw crowd transcriptions to confusion networks. Next, we adapt both multilingual hidden Markov models (HMM) and deep neural network (DNN) models using the probabilistic transcriptions of the African languages. Finally, we report the results using both deterministic and probabilistic phone error rates (PER). Automatic speech recognition systems developed using this recipe are particularly useful for low resource languages where there is limited access to linguistic resources and/or transcribers in the native language.

AB - In this study, we develop automatic speech recognition systems for three sub-Saharan African languages using probabilistic transcriptions collected from crowd workers who neither speak nor have any familiarity with the African languages. The three African languages in consideration are Swahili, Amharic, and Dinka. There is a language mismatch in this scenario. More specifically, utterances spoken in African languages were transcribed by crowd workers who were mostly native speakers of English. Due to this, such transcriptions are highly prone to label inaccuracies. First, we use a recently introduced technique called mismatched crowdsourcing which processes the raw crowd transcriptions to confusion networks. Next, we adapt both multilingual hidden Markov models (HMM) and deep neural network (DNN) models using the probabilistic transcriptions of the African languages. Finally, we report the results using both deterministic and probabilistic phone error rates (PER). Automatic speech recognition systems developed using this recipe are particularly useful for low resource languages where there is limited access to linguistic resources and/or transcribers in the native language.

KW - African languages

KW - Cross-lingual speech recognition

KW - Deep neural networks

KW - Mismatched crowdsourcing

UR - http://www.scopus.com/inward/record.url?scp=84994205146&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994205146&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-657

DO - 10.21437/Interspeech.2016-657

M3 - Conference article

VL - 08-12-September-2016

SP - 3524

EP - 3528

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -