Abstract
In this study, transfer learning techniques are presented for cross-lingual speech recognition to mitigate the effects of limited availability of data in a target language using data from richly resourced source languages. A maximum likelihood (ML) based regularization criterion is used to learn context-dependent Gaussian mixture model (GMM) based hidden Markov model (HMM) parameters for phones in target language using data from both target and source languages. Recognition results indicate improved HMM state alignments. The hidden layers of a deep neural network (DNN) are then initialized using unsupervised pre-training of a multilingual deep belief network (DBN). First, the DNN is fine-tuned using a modified cross entropy criterion that jointly uses HMM state alignments from both target and source languages. Second, another DNN fine-tuning technique is explored where the training is performed in a sequential manner - source language followed by the target language. Experiments conducted using varying amounts of target data indicate improvements in performance can be obtained using joint and sequential training of the DNN compared to existing techniques. Turkish and English were chosen to be the target and source languages respectively.
Original language | English (US) |
---|---|
Pages (from-to) | 3531-3535 |
Number of pages | 5 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Volume | 2015-January |
State | Published - 2015 |
Event | 16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany Duration: Sep 6 2015 → Sep 10 2015 |
Keywords
- Cross-lingual speech recognition
- Deep neural networks
- Hidden Markov models
- Transfer learning
ASJC Scopus subject areas
- Language and Linguistics
- Human-Computer Interaction
- Signal Processing
- Software
- Modeling and Simulation