Grapheme-to-phoneme transduction for cross-language ASR

Mark Hasegawa-Johnson, Leanne Rolston, Camille Goudeseune, Gina Anne Levow, Katrin Kirchhoff

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic speech recognition (ASR) can be deployed in a previously unknown language, in less than 24 h, given just three resources: an acoustic model trained on other languages, a set of language-model training data, and a grapheme-to-phoneme (G2P) transducer to connect them. The LanguageNet G2Ps were created with the goal of being small, fast, and easy to port to a previously unseen language. Data come from pronunciation lexicons if available, but if there are no pronunciation lexicons in the target language, then data are generated from minimal resources: from a Wikipedia description of the target language, or from a one-hour interview with a native speaker of the language. Using such methods, the LanguageNet G2Ps now include simple models in nearly 150 languages, with trained finite state transducers in 122 languages, 59 of which are sufficiently well-resourced to permit measurement of their phone error rates. This paper proposes a measure of the distance between the G2Ps in different languages, and demonstrates that agglomerative clustering of the LanguageNet languages bears some resemblance to a phylogeographic language family tree. The LanguageNet G2Ps proposed in this paper have already been applied in three cross-language ASRs, using both hybrid and end-to-end neural architectures, and further experiments are ongoing.

Original languageEnglish (US)
Title of host publicationStatistical Language and Speech Processing - 8th International Conference, SLSP 2020, Proceedings
EditorsLuis Espinosa-Anke, Irena Spasic, Carlos Martín-Vide
PublisherSpringer
Pages3-19
Number of pages17
ISBN (Print)9783030594299
DOIs
StatePublished - 2020
Event8th International Conference on Statistical Language and Speech Processing, SLSP 2020 - Cardiff, United Kingdom
Duration: Oct 14 2020Oct 16 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12379 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference8th International Conference on Statistical Language and Speech Processing, SLSP 2020
Country/TerritoryUnited Kingdom
CityCardiff
Period10/14/2010/16/20

Keywords

  • Automatic speech recognition
  • Cross-language speech recognition
  • Grapheme-to-phoneme transducers
  • Under-resourced languages

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Grapheme-to-phoneme transduction for cross-language ASR'. Together they form a unique fingerprint.

Cite this