Multi-task learning using mismatched transcription for under-resourced speech recognition

Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson

Research output: Contribution to journalConference articlepeer-review

Abstract

It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g., Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.

Original languageEnglish (US)
Pages (from-to)734-738
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
StatePublished - 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: Aug 20 2017Aug 24 2017

Keywords

  • Low resourced languages
  • Mismatched transcription
  • Multi-task learning
  • Probabilistic transcription

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Fingerprint

Dive into the research topics of 'Multi-task learning using mismatched transcription for under-resourced speech recognition'. Together they form a unique fingerprint.

Cite this