TY - JOUR
T1 - An investigation on training deep neural networks using probabilistic transcriptions
AU - Das, Amit
AU - Hasegawa-Johnson, Mark
N1 - Publisher Copyright:
Copyright © 2016 ISCAGrant The work reported here was started at the JSALT 2015 workshop in the University of Washington, Seattle and was partly supported by the Johns Hopkins University via grants from Google, Microsoft, Amazon, Mitsubishi Electric, and MERL. The authors thank Paul Hager, Massachusetts Institute of Technology and Karel Veselý, Brno University of Technology for discussions.
PY - 2016
Y1 - 2016
N2 - In this study, a transfer learning technique is presented for crosslingual speech recognition in an adverse scenario where there are no natively transcribed transcriptions in the target language. The transcriptions that are available during training are transcribed by crowd workers who neither speak nor have any familiarity with the target language. Hence, such transcriptions are likely to be inaccurate. Training a deep neural network (DNN) in such a scenario is challenging; previously reported results have described DNN error rates exceeding the error rate of an adapted Gaussian Mixture Model (GMM). This paper investigates multi-task learning techniques using deep neural networks which are suitable for this scenario. We report, for the first time, absolute improvement in phone error rates (PER) in the range 1.3-6.2% over GMMs adapted to probabilistic transcriptions. Results are reported for Swahili, Hungarian, and Mandarin.
AB - In this study, a transfer learning technique is presented for crosslingual speech recognition in an adverse scenario where there are no natively transcribed transcriptions in the target language. The transcriptions that are available during training are transcribed by crowd workers who neither speak nor have any familiarity with the target language. Hence, such transcriptions are likely to be inaccurate. Training a deep neural network (DNN) in such a scenario is challenging; previously reported results have described DNN error rates exceeding the error rate of an adapted Gaussian Mixture Model (GMM). This paper investigates multi-task learning techniques using deep neural networks which are suitable for this scenario. We report, for the first time, absolute improvement in phone error rates (PER) in the range 1.3-6.2% over GMMs adapted to probabilistic transcriptions. Results are reported for Swahili, Hungarian, and Mandarin.
KW - Cross-lingual speech recognition
KW - Deep neural networks
KW - Probabilistic transcription grant
KW - Transfer learning
UR - http://www.scopus.com/inward/record.url?scp=84994309649&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84994309649&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2016-655
DO - 10.21437/Interspeech.2016-655
M3 - Conference article
AN - SCOPUS:84994309649
SN - 2308-457X
VL - 08-12-September-2016
SP - 3858
EP - 3862
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016
Y2 - 8 September 2016 through 16 September 2016
ER -