Infant emotional outbursts detection in infant-parent spoken interactions

Xu Yijia, Mark Allan Hasegawa-Johnson, L. McElwain Nancy

Research output: Contribution to journalConference article

Abstract

Detection of infant emotional outbursts, such as crying, in large corpora of recorded infant speech, is essential to the study of dyadic social process, by which infants learn to identify and regulate their own emotions. Such large corpora now exist with the advent of LENA speech monitoring systems, but are not labeled for emotional outbursts. This paper reports on our efforts to manually code child utterances as being of type”laugh”,”cry”,”fuss”,”babble” and”hiccup”, and to develop algorithms capable of performing the same task automatically. Human labelers achieve much higher rates of inter-coder agreement for some of these categories than for others. Linear discriminant analysis (LDA) achieves better accuracy on tokens that have been coded by two human labelers than on tokens that have been coded by only one labeler, but the difference is not as much as we expected, suggesting that the acoustic and contextual features being used by human labelers are not yet available to the LDA. Convolutional neural network and hidden markov model achieve better accuracy than LDA, but worse F-score, because they over-weight the prior. Discounting the transition probability does not solve the problem.

Original languageEnglish (US)
Pages (from-to)242-246
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2018-September
DOIs
StatePublished - Jan 1 2018
Event19th Annual Conference of the International Speech Communication, INTERSPEECH 2018 - Hyderabad, India
Duration: Sep 2 2018Sep 6 2018

Fingerprint

Discriminant analysis
Discriminant Analysis
Interaction
Discounting
Hidden Markov models
Monitoring System
Transition Probability
Markov Model
Acoustics
Neural Networks
Neural networks
Monitoring
Emotion
Human
Linear Discriminant Analysis
Spoken Interaction
Speech
Corpus

Keywords

  • Convolutional neural network
  • Hidden markov model
  • Infant emotional outbursts
  • Infant vocalizations
  • Linear discriminant analysis

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modeling and Simulation

Cite this

Infant emotional outbursts detection in infant-parent spoken interactions. / Yijia, Xu; Hasegawa-Johnson, Mark Allan; McElwain Nancy, L.

In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2018-September, 01.01.2018, p. 242-246.

Research output: Contribution to journalConference article

@article{2114ecd7f07b449f981c7f7b91cd28a6,
title = "Infant emotional outbursts detection in infant-parent spoken interactions",
abstract = "Detection of infant emotional outbursts, such as crying, in large corpora of recorded infant speech, is essential to the study of dyadic social process, by which infants learn to identify and regulate their own emotions. Such large corpora now exist with the advent of LENA speech monitoring systems, but are not labeled for emotional outbursts. This paper reports on our efforts to manually code child utterances as being of type”laugh”,”cry”,”fuss”,”babble” and”hiccup”, and to develop algorithms capable of performing the same task automatically. Human labelers achieve much higher rates of inter-coder agreement for some of these categories than for others. Linear discriminant analysis (LDA) achieves better accuracy on tokens that have been coded by two human labelers than on tokens that have been coded by only one labeler, but the difference is not as much as we expected, suggesting that the acoustic and contextual features being used by human labelers are not yet available to the LDA. Convolutional neural network and hidden markov model achieve better accuracy than LDA, but worse F-score, because they over-weight the prior. Discounting the transition probability does not solve the problem.",
keywords = "Convolutional neural network, Hidden markov model, Infant emotional outbursts, Infant vocalizations, Linear discriminant analysis",
author = "Xu Yijia and Hasegawa-Johnson, {Mark Allan} and {McElwain Nancy}, L.",
year = "2018",
month = "1",
day = "1",
doi = "10.21437/Interspeech.2018-2429",
language = "English (US)",
volume = "2018-September",
pages = "242--246",
journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
issn = "2308-457X",

}

TY - JOUR

T1 - Infant emotional outbursts detection in infant-parent spoken interactions

AU - Yijia, Xu

AU - Hasegawa-Johnson, Mark Allan

AU - McElwain Nancy, L.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Detection of infant emotional outbursts, such as crying, in large corpora of recorded infant speech, is essential to the study of dyadic social process, by which infants learn to identify and regulate their own emotions. Such large corpora now exist with the advent of LENA speech monitoring systems, but are not labeled for emotional outbursts. This paper reports on our efforts to manually code child utterances as being of type”laugh”,”cry”,”fuss”,”babble” and”hiccup”, and to develop algorithms capable of performing the same task automatically. Human labelers achieve much higher rates of inter-coder agreement for some of these categories than for others. Linear discriminant analysis (LDA) achieves better accuracy on tokens that have been coded by two human labelers than on tokens that have been coded by only one labeler, but the difference is not as much as we expected, suggesting that the acoustic and contextual features being used by human labelers are not yet available to the LDA. Convolutional neural network and hidden markov model achieve better accuracy than LDA, but worse F-score, because they over-weight the prior. Discounting the transition probability does not solve the problem.

AB - Detection of infant emotional outbursts, such as crying, in large corpora of recorded infant speech, is essential to the study of dyadic social process, by which infants learn to identify and regulate their own emotions. Such large corpora now exist with the advent of LENA speech monitoring systems, but are not labeled for emotional outbursts. This paper reports on our efforts to manually code child utterances as being of type”laugh”,”cry”,”fuss”,”babble” and”hiccup”, and to develop algorithms capable of performing the same task automatically. Human labelers achieve much higher rates of inter-coder agreement for some of these categories than for others. Linear discriminant analysis (LDA) achieves better accuracy on tokens that have been coded by two human labelers than on tokens that have been coded by only one labeler, but the difference is not as much as we expected, suggesting that the acoustic and contextual features being used by human labelers are not yet available to the LDA. Convolutional neural network and hidden markov model achieve better accuracy than LDA, but worse F-score, because they over-weight the prior. Discounting the transition probability does not solve the problem.

KW - Convolutional neural network

KW - Hidden markov model

KW - Infant emotional outbursts

KW - Infant vocalizations

KW - Linear discriminant analysis

UR - http://www.scopus.com/inward/record.url?scp=85054994129&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054994129&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2018-2429

DO - 10.21437/Interspeech.2018-2429

M3 - Conference article

AN - SCOPUS:85054994129

VL - 2018-September

SP - 242

EP - 246

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SN - 2308-457X

ER -