TY - JOUR
T1 - Infant emotional outbursts detection in infant-parent spoken interactions
AU - Yijia, Xu
AU - Allan, Hasegawa Johnson Mark
AU - McElwain Nancy, L.
N1 - Funding Information:
We thank Macie Berg, Rachel Diechstetter, Emily Flammers-feld, Elizabeth Mooney, and Shreya Patel who manually annotating the LENA files. This study was supported by a seed grant from the Social and Behavioral Sciences Research Initiative at the University of Illinois at Urbana-Champaign.
Publisher Copyright:
© 2018 International Speech Communication Association. All rights reserved.
PY - 2018
Y1 - 2018
N2 - Detection of infant emotional outbursts, such as crying, in large corpora of recorded infant speech, is essential to the study of dyadic social process, by which infants learn to identify and regulate their own emotions. Such large corpora now exist with the advent of LENA speech monitoring systems, but are not labeled for emotional outbursts. This paper reports on our efforts to manually code child utterances as being of type”laugh”,”cry”,”fuss”,”babble” and”hiccup”, and to develop algorithms capable of performing the same task automatically. Human labelers achieve much higher rates of inter-coder agreement for some of these categories than for others. Linear discriminant analysis (LDA) achieves better accuracy on tokens that have been coded by two human labelers than on tokens that have been coded by only one labeler, but the difference is not as much as we expected, suggesting that the acoustic and contextual features being used by human labelers are not yet available to the LDA. Convolutional neural network and hidden markov model achieve better accuracy than LDA, but worse F-score, because they over-weight the prior. Discounting the transition probability does not solve the problem.
AB - Detection of infant emotional outbursts, such as crying, in large corpora of recorded infant speech, is essential to the study of dyadic social process, by which infants learn to identify and regulate their own emotions. Such large corpora now exist with the advent of LENA speech monitoring systems, but are not labeled for emotional outbursts. This paper reports on our efforts to manually code child utterances as being of type”laugh”,”cry”,”fuss”,”babble” and”hiccup”, and to develop algorithms capable of performing the same task automatically. Human labelers achieve much higher rates of inter-coder agreement for some of these categories than for others. Linear discriminant analysis (LDA) achieves better accuracy on tokens that have been coded by two human labelers than on tokens that have been coded by only one labeler, but the difference is not as much as we expected, suggesting that the acoustic and contextual features being used by human labelers are not yet available to the LDA. Convolutional neural network and hidden markov model achieve better accuracy than LDA, but worse F-score, because they over-weight the prior. Discounting the transition probability does not solve the problem.
KW - Convolutional neural network
KW - Hidden markov model
KW - Infant emotional outbursts
KW - Infant vocalizations
KW - Linear discriminant analysis
UR - http://www.scopus.com/inward/record.url?scp=85054994129&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054994129&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2018-2429
DO - 10.21437/Interspeech.2018-2429
M3 - Conference article
AN - SCOPUS:85054994129
SN - 2308-457X
VL - 2018-September
SP - 242
EP - 246
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 19th Annual Conference of the International Speech Communication, INTERSPEECH 2018
Y2 - 2 September 2018 through 6 September 2018
ER -