Multi-task learning using mismatched transcription for under-resourced speech recognition

Van Hai Do, Nancy F. Chen, Boon Pang Lim, Mark Allan Hasegawa-Johnson

Research output: Contribution to journalConference article

Abstract

It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g., Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.

Original languageEnglish (US)
Pages (from-to)734-738
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
StatePublished - Jan 1 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: Aug 20 2017Aug 24 2017

Fingerprint

Multi-task Learning
Transcription
Speech Recognition
Speech recognition
Acoustics
Acoustic Model
Performance Modeling
Large Set
Labels
Baseline
Alignment
Language
Experiment
Experiments

Keywords

  • Low resourced languages
  • Mismatched transcription
  • Multi-task learning
  • Probabilistic transcription

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Multi-task learning using mismatched transcription for under-resourced speech recognition. / Do, Van Hai; Chen, Nancy F.; Lim, Boon Pang; Hasegawa-Johnson, Mark Allan.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2017-August, 01.01.2017, p. 734-738.

Research output: Contribution to journalConference article

@article{91b95ec442fa46e9a90fcc086bf2094c,
title = "Multi-task learning using mismatched transcription for under-resourced speech recognition",
abstract = "It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g., Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.",
keywords = "Low resourced languages, Mismatched transcription, Multi-task learning, Probabilistic transcription",
author = "Do, {Van Hai} and Chen, {Nancy F.} and Lim, {Boon Pang} and Hasegawa-Johnson, {Mark Allan}",
year = "2017",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2017-788",
language = "English (US)",
volume = "2017-August",
pages = "734--738",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Multi-task learning using mismatched transcription for under-resourced speech recognition

AU - Do, Van Hai

AU - Chen, Nancy F.

AU - Lim, Boon Pang

AU - Hasegawa-Johnson, Mark Allan

PY - 2017/1/1

Y1 - 2017/1/1

N2 - It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g., Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.

AB - It is challenging to obtain large amounts of native (matched) labels for audio in under-resourced languages. This could be due to a lack of literate speakers of the language or a lack of universally acknowledged orthography. One solution is to increase the amount of labeled data by using mismatched transcription, which employs transcribers who do not speak the language (in place of native speakers), to transcribe what they hear as nonsense speech in their own language (e.g., Mandarin). This paper presents a multi-task learning framework where the DNN acoustic model is simultaneously trained using both a limited amount of native (matched) transcription and a larger set of mismatched transcription. We find that by using a multi-task learning framework, we achieve improvements over monolingual baselines and previously proposed mismatched transcription adaptation techniques. In addition, we show that using alignments provided by a GMM adapted by mismatched transcription further improves acoustic modeling performance. Our experiments on Georgian data from the IARPA Babel program show the effectiveness of the proposed method.

KW - Low resourced languages

KW - Mismatched transcription

KW - Multi-task learning

KW - Probabilistic transcription

UR - http://www.scopus.com/inward/record.url?scp=85039162023&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039162023&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2017-788

DO - 10.21437/Interspeech.2017-788

M3 - Conference article

VL - 2017-August

SP - 734

EP - 738

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -