The Time-Course of Phoneme Category Adaptation in Deep Neural Networks

Junrui Ni, Mark Hasegawa-Johnson, Odette Scharenborg

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.

Original languageEnglish (US)
Title of host publicationStatistical Language and Speech Processing - 7th International Conference, SLSP 2019, Proceedings
EditorsCarlos Martín-Vide, Matthew Purver, Senja Pollak
PublisherSpringer
Pages3-15
Number of pages13
ISBN (Print)9783030313715
DOIs
StatePublished - Jan 1 2019
Event7th International Conference on Statistical Language and Speech Processing, SLSP 2019 - Ljubljana, Slovenia
Duration: Oct 14 2019Oct 16 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11816 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference7th International Conference on Statistical Language and Speech Processing, SLSP 2019
CountrySlovenia
CityLjubljana
Period10/14/1910/16/19

    Fingerprint

Keywords

  • Deep neural networks
  • Human perceptual learning
  • Phoneme category adaptation
  • Time-course

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Ni, J., Hasegawa-Johnson, M., & Scharenborg, O. (2019). The Time-Course of Phoneme Category Adaptation in Deep Neural Networks. In C. Martín-Vide, M. Purver, & S. Pollak (Eds.), Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Proceedings (pp. 3-15). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 11816 LNAI). Springer. https://doi.org/10.1007/978-3-030-31372-2_1