An investigation on training deep neural networks using probabilistic transcriptions

Research output: Contribution to journalConference article

Abstract

In this study, a transfer learning technique is presented for crosslingual speech recognition in an adverse scenario where there are no natively transcribed transcriptions in the target language. The transcriptions that are available during training are transcribed by crowd workers who neither speak nor have any familiarity with the target language. Hence, such transcriptions are likely to be inaccurate. Training a deep neural network (DNN) in such a scenario is challenging; previously reported results have described DNN error rates exceeding the error rate of an adapted Gaussian Mixture Model (GMM). This paper investigates multi-task learning techniques using deep neural networks which are suitable for this scenario. We report, for the first time, absolute improvement in phone error rates (PER) in the range 1.3-6.2% over GMMs adapted to probabilistic transcriptions. Results are reported for Swahili, Hungarian, and Mandarin.

Original languageEnglish (US)
Pages (from-to)3858-3862
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume08-12-September-2016
DOIs
StatePublished - Jan 1 2016
Event17th Annual Conference of the International Speech Communication Association, INTERSPEECH 2016 - San Francisco, United States
Duration: Sep 8 2016Sep 16 2016

Fingerprint

Probabilistic Neural Network
Transcription
Error Rate
Neural Networks
Scenarios
Multi-task Learning
Transfer Learning
Target
Gaussian Mixture Model
Speech Recognition
Inaccurate
Speech recognition
Likely
Training
Deep neural networks
Range of data
Language

Keywords

  • Cross-lingual speech recognition
  • Deep neural networks
  • Probabilistic transcription grant
  • Transfer learning

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

@article{a4e829789f9a4f64ab824771cb243ce5,
title = "An investigation on training deep neural networks using probabilistic transcriptions",
abstract = "In this study, a transfer learning technique is presented for crosslingual speech recognition in an adverse scenario where there are no natively transcribed transcriptions in the target language. The transcriptions that are available during training are transcribed by crowd workers who neither speak nor have any familiarity with the target language. Hence, such transcriptions are likely to be inaccurate. Training a deep neural network (DNN) in such a scenario is challenging; previously reported results have described DNN error rates exceeding the error rate of an adapted Gaussian Mixture Model (GMM). This paper investigates multi-task learning techniques using deep neural networks which are suitable for this scenario. We report, for the first time, absolute improvement in phone error rates (PER) in the range 1.3-6.2{\%} over GMMs adapted to probabilistic transcriptions. Results are reported for Swahili, Hungarian, and Mandarin.",
keywords = "Cross-lingual speech recognition, Deep neural networks, Probabilistic transcription grant, Transfer learning",
author = "Amit Das and Hasegawa-Johnson, {Mark Allan}",
year = "2016",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2016-655",
language = "English (US)",
volume = "08-12-September-2016",
pages = "3858--3862",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - An investigation on training deep neural networks using probabilistic transcriptions

AU - Das, Amit

AU - Hasegawa-Johnson, Mark Allan

PY - 2016/1/1

Y1 - 2016/1/1

N2 - In this study, a transfer learning technique is presented for crosslingual speech recognition in an adverse scenario where there are no natively transcribed transcriptions in the target language. The transcriptions that are available during training are transcribed by crowd workers who neither speak nor have any familiarity with the target language. Hence, such transcriptions are likely to be inaccurate. Training a deep neural network (DNN) in such a scenario is challenging; previously reported results have described DNN error rates exceeding the error rate of an adapted Gaussian Mixture Model (GMM). This paper investigates multi-task learning techniques using deep neural networks which are suitable for this scenario. We report, for the first time, absolute improvement in phone error rates (PER) in the range 1.3-6.2% over GMMs adapted to probabilistic transcriptions. Results are reported for Swahili, Hungarian, and Mandarin.

AB - In this study, a transfer learning technique is presented for crosslingual speech recognition in an adverse scenario where there are no natively transcribed transcriptions in the target language. The transcriptions that are available during training are transcribed by crowd workers who neither speak nor have any familiarity with the target language. Hence, such transcriptions are likely to be inaccurate. Training a deep neural network (DNN) in such a scenario is challenging; previously reported results have described DNN error rates exceeding the error rate of an adapted Gaussian Mixture Model (GMM). This paper investigates multi-task learning techniques using deep neural networks which are suitable for this scenario. We report, for the first time, absolute improvement in phone error rates (PER) in the range 1.3-6.2% over GMMs adapted to probabilistic transcriptions. Results are reported for Swahili, Hungarian, and Mandarin.

KW - Cross-lingual speech recognition

KW - Deep neural networks

KW - Probabilistic transcription grant

KW - Transfer learning

UR - http://www.scopus.com/inward/record.url?scp=84994309649&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84994309649&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2016-655

DO - 10.21437/Interspeech.2016-655

M3 - Conference article

AN - SCOPUS:84994309649

VL - 08-12-September-2016

SP - 3858

EP - 3862

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -