Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription

Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark A. Hasegawa-Johnson

Research output: Contribution to journalArticlepeer-review


It is challenging to obtain large amounts of native (matched) labels for speech audio in underresourced languages. This challenge is often due to a lack of literate speakers of the language, or in extreme cases, a lack of universally acknowledged orthography as well. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the underresourced language of interest called the target language (in place of native speakers), to transcribe what they hear as nonsense speech in their own annotation language (≠ target language). Previous uses of mismatched transcription converted it to a probabilistic transcription (PT), but PT is limited by the errors of nonnative perception. This paper proposes, instead, a multitask learning framework in which one deep neural network (DNN) is trained to optimize two separate tasks: acoustic modeling of a small number of matched transcription with matched target-language graphemes; and acoustic modeling of a large number of mismatched transcription with mismatched annotation-language graphemes. We find that: first, the multitask learning framework gives significant improvement over monolingual, semisupervised learning, multilingual DNN training, and transfer learning baselines; second, a Gaussian Mixture Model-Hidden-Markov Model (GMM-HMM) model adapted using PT improves alignments, thereby improving training; and third, bottleneck features trained on the mismatched transcriptions lead to even better alignments, resulting in further performance gains of the multitask DNN. Our experiments are conducted on the IARPA Georgian and Vietnamese BABEL corpora as well as on our newly collected speech corpus of Singapore Hokkien, an underresourced language with no standard written form.

Original languageEnglish (US)
Article number8186239
Pages (from-to)501-514
Number of pages14
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Issue number3
StatePublished - Mar 2018


  • Phone recognition
  • mismatched transcription
  • multi-task learning
  • probabilistic transcription
  • under-resourced languages

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Acoustics and Ultrasonics
  • Computational Mathematics
  • Electrical and Electronic Engineering


Dive into the research topics of 'Multitask Learning for Phone Recognition of Underresourced Languages Using Mismatched Transcription'. Together they form a unique fingerprint.

Cite this