Bayesian Models for Unit Discovery on a Very Low Resource Language

Lucas Ondel, Pierre Godard, Laurent Besacier, Elin Larsen, Mark Hasegawa-Johnson, Odette Scharenborg, Emmanuel Dupoux, Lukas Burget, Francois Yvon, Sanjeev Khudanpur

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Developing speech technologies for low-resource languages has become a very active research field over the last decade. Among others, Bayesian models have shown some promising results on artificial examples but still lack of in situ experiments. Our work applies state-of-the-art Bayesian models to unsupervised Acoustic Unit Discovery (AUD) in a real low-resource language scenario. We also show that Bayesian models can naturally integrate information from other resourceful languages by means of informative prior leading to more consistent discovered units. Finally, discovered acoustic units are used, either as the I-best sequence or as a lattice, to perform word segmentation. Word segmentation results show that this Bayesian approach clearly outperforms a Segmental-DTW baseline on the same corpus.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages5939-5943
Number of pages5
ISBN (Print)9781538646588
DOIs
StatePublished - Sep 10 2018
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: Apr 15 2018Apr 20 2018

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2018-April
ISSN (Print)1520-6149

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
CountryCanada
CityCalgary
Period4/15/184/20/18

    Fingerprint

Keywords

  • Acoustic Unit Discovery
  • Bayesian Model
  • Informative Prior
  • Low-Resource ASR

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Cite this

Ondel, L., Godard, P., Besacier, L., Larsen, E., Hasegawa-Johnson, M., Scharenborg, O., Dupoux, E., Burget, L., Yvon, F., & Khudanpur, S. (2018). Bayesian Models for Unit Discovery on a Very Low Resource Language. In 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings (pp. 5939-5943). [8461545] (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Vol. 2018-April). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICASSP.2018.8461545