Cross-lingual transfer learning during supervised training in low resource scenarios

Research output: Contribution to journalConference article

Abstract

In this study, transfer learning techniques are presented for cross-lingual speech recognition to mitigate the effects of limited availability of data in a target language using data from richly resourced source languages. A maximum likelihood (ML) based regularization criterion is used to learn context-dependent Gaussian mixture model (GMM) based hidden Markov model (HMM) parameters for phones in target language using data from both target and source languages. Recognition results indicate improved HMM state alignments. The hidden layers of a deep neural network (DNN) are then initialized using unsupervised pre-training of a multilingual deep belief network (DBN). First, the DNN is fine-tuned using a modified cross entropy criterion that jointly uses HMM state alignments from both target and source languages. Second, another DNN fine-tuning technique is explored where the training is performed in a sequential manner - source language followed by the target language. Experiments conducted using varying amounts of target data indicate improvements in performance can be obtained using joint and sequential training of the DNN compared to existing techniques. Turkish and English were chosen to be the target and source languages respectively.

Original languageEnglish (US)
Pages (from-to)3531-3535
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2015-January
StatePublished - Jan 1 2015
Event16th Annual Conference of the International Speech Communication Association, INTERSPEECH 2015 - Dresden, Germany
Duration: Sep 6 2015Sep 10 2015

Fingerprint

Transfer Learning
Supervised learning
Hidden Markov models
Scenarios
Resources
Target
Neural Networks
Markov Model
Bayesian networks
Speech recognition
Maximum likelihood
Alignment
Entropy
Tuning
Availability
Belief Networks
Cross-entropy
Training
Language
Deep neural networks

Keywords

  • Cross-lingual speech recognition
  • Deep neural networks
  • Hidden Markov models
  • Transfer learning

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

@article{359ec5bc1cda4523895af1b07eb9f48b,
title = "Cross-lingual transfer learning during supervised training in low resource scenarios",
abstract = "In this study, transfer learning techniques are presented for cross-lingual speech recognition to mitigate the effects of limited availability of data in a target language using data from richly resourced source languages. A maximum likelihood (ML) based regularization criterion is used to learn context-dependent Gaussian mixture model (GMM) based hidden Markov model (HMM) parameters for phones in target language using data from both target and source languages. Recognition results indicate improved HMM state alignments. The hidden layers of a deep neural network (DNN) are then initialized using unsupervised pre-training of a multilingual deep belief network (DBN). First, the DNN is fine-tuned using a modified cross entropy criterion that jointly uses HMM state alignments from both target and source languages. Second, another DNN fine-tuning technique is explored where the training is performed in a sequential manner - source language followed by the target language. Experiments conducted using varying amounts of target data indicate improvements in performance can be obtained using joint and sequential training of the DNN compared to existing techniques. Turkish and English were chosen to be the target and source languages respectively.",
keywords = "Cross-lingual speech recognition, Deep neural networks, Hidden Markov models, Transfer learning",
author = "Amit Das and Hasegawa-Johnson, {Mark Allan}",
year = "2015",
month = "1",
day = "1",
language = "English (US)",
volume = "2015-January",
pages = "3531--3535",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Cross-lingual transfer learning during supervised training in low resource scenarios

AU - Das, Amit

AU - Hasegawa-Johnson, Mark Allan

PY - 2015/1/1

Y1 - 2015/1/1

N2 - In this study, transfer learning techniques are presented for cross-lingual speech recognition to mitigate the effects of limited availability of data in a target language using data from richly resourced source languages. A maximum likelihood (ML) based regularization criterion is used to learn context-dependent Gaussian mixture model (GMM) based hidden Markov model (HMM) parameters for phones in target language using data from both target and source languages. Recognition results indicate improved HMM state alignments. The hidden layers of a deep neural network (DNN) are then initialized using unsupervised pre-training of a multilingual deep belief network (DBN). First, the DNN is fine-tuned using a modified cross entropy criterion that jointly uses HMM state alignments from both target and source languages. Second, another DNN fine-tuning technique is explored where the training is performed in a sequential manner - source language followed by the target language. Experiments conducted using varying amounts of target data indicate improvements in performance can be obtained using joint and sequential training of the DNN compared to existing techniques. Turkish and English were chosen to be the target and source languages respectively.

AB - In this study, transfer learning techniques are presented for cross-lingual speech recognition to mitigate the effects of limited availability of data in a target language using data from richly resourced source languages. A maximum likelihood (ML) based regularization criterion is used to learn context-dependent Gaussian mixture model (GMM) based hidden Markov model (HMM) parameters for phones in target language using data from both target and source languages. Recognition results indicate improved HMM state alignments. The hidden layers of a deep neural network (DNN) are then initialized using unsupervised pre-training of a multilingual deep belief network (DBN). First, the DNN is fine-tuned using a modified cross entropy criterion that jointly uses HMM state alignments from both target and source languages. Second, another DNN fine-tuning technique is explored where the training is performed in a sequential manner - source language followed by the target language. Experiments conducted using varying amounts of target data indicate improvements in performance can be obtained using joint and sequential training of the DNN compared to existing techniques. Turkish and English were chosen to be the target and source languages respectively.

KW - Cross-lingual speech recognition

KW - Deep neural networks

KW - Hidden Markov models

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=84959115246&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959115246&partnerID=8YFLogxK

M3 - Conference article

VL - 2015-January

SP - 3531

EP - 3535

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -