Speech recognition of under-resourced languages using mismatched transcriptions

Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Mismatched crowdsourcing is a technique to derive speech transcriptions using crowd-workers unfamiliar with the language being spoken. This technique is especially useful for under-resourced languages since it is hard to hire native transcribers. In this paper, we demonstrate that using mismatched transcription for adaptation improves performance of speech recognition under limited matched training data conditions. In addition, we show that using data augmentation improves not only performance of monolingual system but also makes mismatched transcription adaptation more effective.

Original languageEnglish (US)
Title of host publicationProceedings of the 2016 International Conference on Asian Language Processing, IALP 2016
EditorsMinghui Dong, Chung-Hsien Wu, Yanfeng Lu, Haizhou Li, Yuen-Hsien Tseng, Liang-Chih Yu, Lung-Hao Lee
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages112-115
Number of pages4
ISBN (Electronic)9781509009213
DOIs
StatePublished - Mar 10 2017
Event20th International Conference on Asian Language Processing, IALP 2016 - Tainan, Taiwan, Province of China
Duration: Nov 21 2016Nov 23 2016

Publication series

NameProceedings of the 2016 International Conference on Asian Language Processing, IALP 2016

Other

Other20th International Conference on Asian Language Processing, IALP 2016
Country/TerritoryTaiwan, Province of China
CityTainan
Period11/21/1611/23/16

Keywords

  • data augmentation
  • mismatched transcription
  • model adaptation
  • speech recognition
  • under-resourced language

ASJC Scopus subject areas

  • Signal Processing
  • Computer Vision and Pattern Recognition
  • Linguistics and Language
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Speech recognition of under-resourced languages using mismatched transcriptions'. Together they form a unique fingerprint.

Cite this