Mismatched crowdsourcing from multiple annotator languages for recognizing zero-resourced languages: A nullspace clustering approach

Wenda Chen, Mark Allan Hasegawa-Johnson, Nancy F. Chen, Boon Pang Lim

Research output: Contribution to journalConference article

Abstract

It is extremely challenging to create training labels for building acoustic models of zero-resourced languages, in which conventional resources required for model training - lexicons, transcribed audio, or in extreme cases even orthographic system or a viable phone set design for the language - are unavailable. Here, language mismatched transcripts, in which audio is transcribed in the orthographic system of a completely different language by possibly non-speakers of the target language may play a vital role. Such mismatched transcripts have recently been successfully obtained through crowdsourcing and shown to be beneficial to ASR performance. This paper further studies this problem of using mismatched crowdsourced transcripts in a tonal language for which we have no standard orthography, and in which we may not even know the phoneme inventory. It proposes methods to project the multilingual mismatched transcriptions of a tonal language to the target phone segments. The results tested on Cantonese and Singapore Hokkien have shown that the reconstructed phone sequences' accuracies have absolute increment of more than 3% from those of previously proposed monolingual probabilistic transcription methods.

Original languageEnglish (US)
Pages (from-to)2789-2793
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2017-August
DOIs
StatePublished - Jan 1 2017
Event18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden
Duration: Aug 20 2017Aug 24 2017

Fingerprint

Transcription
Clustering
Zero
Labels
Acoustics
Acoustic Model
Target
Language
Increment
Extremes
Resources
Phone

Keywords

  • Automatic Speech Recognition
  • Mismatched Crowdsourcing And Perception
  • Zero-Resourced Languages

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

@article{bf1b641f1a064e00aead0996b684fb4b,
title = "Mismatched crowdsourcing from multiple annotator languages for recognizing zero-resourced languages: A nullspace clustering approach",
abstract = "It is extremely challenging to create training labels for building acoustic models of zero-resourced languages, in which conventional resources required for model training - lexicons, transcribed audio, or in extreme cases even orthographic system or a viable phone set design for the language - are unavailable. Here, language mismatched transcripts, in which audio is transcribed in the orthographic system of a completely different language by possibly non-speakers of the target language may play a vital role. Such mismatched transcripts have recently been successfully obtained through crowdsourcing and shown to be beneficial to ASR performance. This paper further studies this problem of using mismatched crowdsourced transcripts in a tonal language for which we have no standard orthography, and in which we may not even know the phoneme inventory. It proposes methods to project the multilingual mismatched transcriptions of a tonal language to the target phone segments. The results tested on Cantonese and Singapore Hokkien have shown that the reconstructed phone sequences' accuracies have absolute increment of more than 3{\%} from those of previously proposed monolingual probabilistic transcription methods.",
keywords = "Automatic Speech Recognition, Mismatched Crowdsourcing And Perception, Zero-Resourced Languages",
author = "Wenda Chen and Hasegawa-Johnson, {Mark Allan} and Chen, {Nancy F.} and Lim, {Boon Pang}",
year = "2017",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2017-1567",
language = "English (US)",
volume = "2017-August",
pages = "2789--2793",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Mismatched crowdsourcing from multiple annotator languages for recognizing zero-resourced languages

T2 - A nullspace clustering approach

AU - Chen, Wenda

AU - Hasegawa-Johnson, Mark Allan

AU - Chen, Nancy F.

AU - Lim, Boon Pang

PY - 2017/1/1

Y1 - 2017/1/1

N2 - It is extremely challenging to create training labels for building acoustic models of zero-resourced languages, in which conventional resources required for model training - lexicons, transcribed audio, or in extreme cases even orthographic system or a viable phone set design for the language - are unavailable. Here, language mismatched transcripts, in which audio is transcribed in the orthographic system of a completely different language by possibly non-speakers of the target language may play a vital role. Such mismatched transcripts have recently been successfully obtained through crowdsourcing and shown to be beneficial to ASR performance. This paper further studies this problem of using mismatched crowdsourced transcripts in a tonal language for which we have no standard orthography, and in which we may not even know the phoneme inventory. It proposes methods to project the multilingual mismatched transcriptions of a tonal language to the target phone segments. The results tested on Cantonese and Singapore Hokkien have shown that the reconstructed phone sequences' accuracies have absolute increment of more than 3% from those of previously proposed monolingual probabilistic transcription methods.

AB - It is extremely challenging to create training labels for building acoustic models of zero-resourced languages, in which conventional resources required for model training - lexicons, transcribed audio, or in extreme cases even orthographic system or a viable phone set design for the language - are unavailable. Here, language mismatched transcripts, in which audio is transcribed in the orthographic system of a completely different language by possibly non-speakers of the target language may play a vital role. Such mismatched transcripts have recently been successfully obtained through crowdsourcing and shown to be beneficial to ASR performance. This paper further studies this problem of using mismatched crowdsourced transcripts in a tonal language for which we have no standard orthography, and in which we may not even know the phoneme inventory. It proposes methods to project the multilingual mismatched transcriptions of a tonal language to the target phone segments. The results tested on Cantonese and Singapore Hokkien have shown that the reconstructed phone sequences' accuracies have absolute increment of more than 3% from those of previously proposed monolingual probabilistic transcription methods.

KW - Automatic Speech Recognition

KW - Mismatched Crowdsourcing And Perception

KW - Zero-Resourced Languages

UR - http://www.scopus.com/inward/record.url?scp=85039149729&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85039149729&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2017-1567

DO - 10.21437/Interspeech.2017-1567

M3 - Conference article

AN - SCOPUS:85039149729

VL - 2017-August

SP - 2789

EP - 2793

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -