TY - JOUR
T1 - Domain Generalization for Language-Independent Automatic Speech Recognition
AU - Gao, Heting
AU - Ni, Junrui
AU - Zhang, Yang
AU - Qian, Kaizhi
AU - Chang, Shiyu
AU - Hasegawa-Johnson, Mark
N1 - Publisher Copyright:
Copyright © 2022 Gao, Ni, Zhang, Qian, Chang and Hasegawa-Johnson.
PY - 2022/5/12
Y1 - 2022/5/12
N2 - A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained. Language-independent ASR is difficult to train, because different languages implement phones differently: even when phonemes in two different languages are written using the same symbols in the international phonetic alphabet, they are differentiated by different distributions of language-dependent redundant articulatory features. This article demonstrates that the goal of language-independence may be approximated in different ways, depending on the size of the training set, the presence vs. absence of familial relationships between the training and test languages, and the method used to implement phone recognition or classification. When the training set contains many languages, and when every language in the test set is related (shares the same language family with) a language in the training set, then language-independent ASR may be trained using an empirical risk minimization strategy (e.g., using connectionist temporal classification without extra regularizers). When the training set is limited to a small number of languages from one language family, however, and the test languages are not from the same language family, then the best performance is achieved by using domain-invariant representation learning strategies. Two different representation learning strategies are tested in this article: invariant risk minimization, and regret minimization. We find that invariant risk minimization is better at the task of phone token classification (given known segment boundary times), while regret minimization is better at the task of phone token recognition.
AB - A language-independent automatic speech recognizer (ASR) is one that can be used for phonetic transcription in languages other than the languages in which it was trained. Language-independent ASR is difficult to train, because different languages implement phones differently: even when phonemes in two different languages are written using the same symbols in the international phonetic alphabet, they are differentiated by different distributions of language-dependent redundant articulatory features. This article demonstrates that the goal of language-independence may be approximated in different ways, depending on the size of the training set, the presence vs. absence of familial relationships between the training and test languages, and the method used to implement phone recognition or classification. When the training set contains many languages, and when every language in the test set is related (shares the same language family with) a language in the training set, then language-independent ASR may be trained using an empirical risk minimization strategy (e.g., using connectionist temporal classification without extra regularizers). When the training set is limited to a small number of languages from one language family, however, and the test languages are not from the same language family, then the best performance is achieved by using domain-invariant representation learning strategies. Two different representation learning strategies are tested in this article: invariant risk minimization, and regret minimization. We find that invariant risk minimization is better at the task of phone token classification (given known segment boundary times), while regret minimization is better at the task of phone token recognition.
KW - automatic speech recognition
KW - distributionally robust optimization
KW - domain generalization
KW - invariant risk minimization
KW - regret minimization
KW - under-resourced languages
UR - http://www.scopus.com/inward/record.url?scp=85131168460&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131168460&partnerID=8YFLogxK
U2 - 10.3389/frai.2022.806274
DO - 10.3389/frai.2022.806274
M3 - Article
C2 - 35647534
AN - SCOPUS:85131168460
SN - 2624-8212
VL - 5
JO - Frontiers in Artificial Intelligence
JF - Frontiers in Artificial Intelligence
M1 - 806274
ER -