TY - GEN
T1 - The Time-Course of Phoneme Category Adaptation in Deep Neural Networks
AU - Ni, Junrui
AU - Hasegawa-Johnson, Mark
AU - Scharenborg, Odette
N1 - Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.
AB - Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.
KW - Deep neural networks
KW - Human perceptual learning
KW - Phoneme category adaptation
KW - Time-course
UR - http://www.scopus.com/inward/record.url?scp=85075883129&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075883129&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-31372-2_1
DO - 10.1007/978-3-030-31372-2_1
M3 - Conference contribution
AN - SCOPUS:85075883129
SN - 9783030313715
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 3
EP - 15
BT - Statistical Language and Speech Processing - 7th International Conference, SLSP 2019, Proceedings
A2 - Martín-Vide, Carlos
A2 - Purver, Matthew
A2 - Pollak, Senja
PB - Springer
T2 - 7th International Conference on Statistical Language and Speech Processing, SLSP 2019
Y2 - 14 October 2019 through 16 October 2019
ER -