TY - GEN
T1 - The Time-Course of Phoneme Category Adaptation in Deep Neural Networks
AU - Ni, Junrui
AU - Hasegawa-Johnson, Mark
AU - Scharenborg, Odette
N1 - The authors thank Anne Merel Sternheim and Sebastian Tiesmeyer with help in earlier stages of this research, and Louis ten Bosch for providing the forced alignments of the retraining material. This work was carried out by the first author under the supervision of the second and third author.
PY - 2019
Y1 - 2019
N2 - Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.
AB - Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.
KW - Deep neural networks
KW - Human perceptual learning
KW - Phoneme category adaptation
KW - Time-course
UR - http://www.scopus.com/inward/record.url?scp=85075883129&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075883129&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-31372-2_1
DO - 10.1007/978-3-030-31372-2_1
M3 - Conference contribution
AN - SCOPUS:85075883129
SN - 9783030313715
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 15
BT - Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Proceedings
A2 - Martín-Vide, Carlos
A2 - Purver, Matthew
A2 - Pollak, Senja
PB - Springer
T2 - 7th International Conference on Statistical Language and Speech Processing, SLSP 2019
Y2 - 14 October 2019 through 16 October 2019
ER -