Abstract
It is extremely challenging to create training labels for building acoustic models of zero-resourced languages, in which conventional resources required for model training - lexicons, transcribed audio, or in extreme cases even orthographic system or a viable phone set design for the language - are unavailable. Here, language mismatched transcripts, in which audio is transcribed in the orthographic system of a completely different language by possibly non-speakers of the target language may play a vital role. Such mismatched transcripts have recently been successfully obtained through crowdsourcing and shown to be beneficial to ASR performance. This paper further studies this problem of using mismatched crowdsourced transcripts in a tonal language for which we have no standard orthography, and in which we may not even know the phoneme inventory. It proposes methods to project the multilingual mismatched transcriptions of a tonal language to the target phone segments. The results tested on Cantonese and Singapore Hokkien have shown that the reconstructed phone sequences' accuracies have absolute increment of more than 3% from those of previously proposed monolingual probabilistic transcription methods.
Original language | English (US) |
---|---|
Pages (from-to) | 2789-2793 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2017-August |
DOIs | |
State | Published - 2017 |
Event | 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017 - Stockholm, Sweden Duration: Aug 20 2017 → Aug 24 2017 |
Keywords
- Automatic Speech Recognition
- Mismatched Crowdsourcing And Perception
- Zero-Resourced Languages
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation