Abstract

This paper addresses the manual and automatic labelling, from spontaneous speech, of a particular type of user affect that we call the cognitive state in a tutorial dialogue system with students of primary and early middle school ages. Our definition of the cognitive state is based on analysis of children's spontaneous speech, which is acquired during Wizard-of-Oz simulations of an intelligent math and physics tutor. The cognitive states of children are categorized into three classes: confidence, puzzlement, and hesitation. The manual labelling of cognitive states had an inter-transcriber agreement of kappa score 0.93. The automatic cognitive state labels are generated by classifying prosodic features, text features, and spectral features. Text features are generated from an automatic speech recognition (ASR) system; features include indicator functions of keyword classes and part-of-speech sequences. Spectral features are created based on acoustic likelihood scores of a cognitive state-dependent ASR system, in which phoneme models are adapted to utterances labelled for a particular cognitive state. The effectiveness of the proposed method has been tested on both manually and automatically transcribed speech, and the test yielded very high correctness: 96.6% for manually transcribed speech and 95.7% for automatically recognized speech. Our study shows that the proposed spectral features greatly outperformed the other types of features in the cognitive state classification experiments. Our study also shows that the spectral and prosodic features derived directly from speech signals were very robust to speech recognition errors, much more than the lexical and part-of-speech based features.

Original languageEnglish (US)
Pages (from-to)616-632
Number of pages17
JournalSpeech Communication
Volume48
Issue number6
DOIs
StatePublished - Jun 1 2006

Fingerprint

Spoken Dialogue Systems
communication technology
Speech recognition
Automatic Speech Recognition
Labeling
Indicator function
Dialogue Systems
Speech Signal
Speech Recognition
Speech
Cognitive State
Confidence
Correctness
Likelihood
Acoustics
Labels
Physics
tutor
Students
acoustics

Keywords

  • Intelligent tutoring system
  • Spoken language processing
  • User affect recognition

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications

Cite this

Cognitive state classification in a spoken tutorial dialogue system. / Zhang, Tong; Hasegawa-Johnson, Mark Allan; Levinson, Stephen E.

In: Speech Communication, Vol. 48, No. 6, 01.06.2006, p. 616-632.

Research output: Contribution to journalArticle

@article{5986f47f674249c29bb8ac787e0b42e5,
title = "Cognitive state classification in a spoken tutorial dialogue system",
abstract = "This paper addresses the manual and automatic labelling, from spontaneous speech, of a particular type of user affect that we call the cognitive state in a tutorial dialogue system with students of primary and early middle school ages. Our definition of the cognitive state is based on analysis of children's spontaneous speech, which is acquired during Wizard-of-Oz simulations of an intelligent math and physics tutor. The cognitive states of children are categorized into three classes: confidence, puzzlement, and hesitation. The manual labelling of cognitive states had an inter-transcriber agreement of kappa score 0.93. The automatic cognitive state labels are generated by classifying prosodic features, text features, and spectral features. Text features are generated from an automatic speech recognition (ASR) system; features include indicator functions of keyword classes and part-of-speech sequences. Spectral features are created based on acoustic likelihood scores of a cognitive state-dependent ASR system, in which phoneme models are adapted to utterances labelled for a particular cognitive state. The effectiveness of the proposed method has been tested on both manually and automatically transcribed speech, and the test yielded very high correctness: 96.6{\%} for manually transcribed speech and 95.7{\%} for automatically recognized speech. Our study shows that the proposed spectral features greatly outperformed the other types of features in the cognitive state classification experiments. Our study also shows that the spectral and prosodic features derived directly from speech signals were very robust to speech recognition errors, much more than the lexical and part-of-speech based features.",
keywords = "Intelligent tutoring system, Spoken language processing, User affect recognition",
author = "Tong Zhang and Hasegawa-Johnson, {Mark Allan} and Levinson, {Stephen E}",
year = "2006",
month = "6",
day = "1",
doi = "10.1016/j.specom.2005.09.006",
language = "English (US)",
volume = "48",
pages = "616--632",
journal = "Speech Communication",
issn = "0167-6393",
publisher = "Elsevier",
number = "6",

}

TY - JOUR

T1 - Cognitive state classification in a spoken tutorial dialogue system

AU - Zhang, Tong

AU - Hasegawa-Johnson, Mark Allan

AU - Levinson, Stephen E

PY - 2006/6/1

Y1 - 2006/6/1

N2 - This paper addresses the manual and automatic labelling, from spontaneous speech, of a particular type of user affect that we call the cognitive state in a tutorial dialogue system with students of primary and early middle school ages. Our definition of the cognitive state is based on analysis of children's spontaneous speech, which is acquired during Wizard-of-Oz simulations of an intelligent math and physics tutor. The cognitive states of children are categorized into three classes: confidence, puzzlement, and hesitation. The manual labelling of cognitive states had an inter-transcriber agreement of kappa score 0.93. The automatic cognitive state labels are generated by classifying prosodic features, text features, and spectral features. Text features are generated from an automatic speech recognition (ASR) system; features include indicator functions of keyword classes and part-of-speech sequences. Spectral features are created based on acoustic likelihood scores of a cognitive state-dependent ASR system, in which phoneme models are adapted to utterances labelled for a particular cognitive state. The effectiveness of the proposed method has been tested on both manually and automatically transcribed speech, and the test yielded very high correctness: 96.6% for manually transcribed speech and 95.7% for automatically recognized speech. Our study shows that the proposed spectral features greatly outperformed the other types of features in the cognitive state classification experiments. Our study also shows that the spectral and prosodic features derived directly from speech signals were very robust to speech recognition errors, much more than the lexical and part-of-speech based features.

AB - This paper addresses the manual and automatic labelling, from spontaneous speech, of a particular type of user affect that we call the cognitive state in a tutorial dialogue system with students of primary and early middle school ages. Our definition of the cognitive state is based on analysis of children's spontaneous speech, which is acquired during Wizard-of-Oz simulations of an intelligent math and physics tutor. The cognitive states of children are categorized into three classes: confidence, puzzlement, and hesitation. The manual labelling of cognitive states had an inter-transcriber agreement of kappa score 0.93. The automatic cognitive state labels are generated by classifying prosodic features, text features, and spectral features. Text features are generated from an automatic speech recognition (ASR) system; features include indicator functions of keyword classes and part-of-speech sequences. Spectral features are created based on acoustic likelihood scores of a cognitive state-dependent ASR system, in which phoneme models are adapted to utterances labelled for a particular cognitive state. The effectiveness of the proposed method has been tested on both manually and automatically transcribed speech, and the test yielded very high correctness: 96.6% for manually transcribed speech and 95.7% for automatically recognized speech. Our study shows that the proposed spectral features greatly outperformed the other types of features in the cognitive state classification experiments. Our study also shows that the spectral and prosodic features derived directly from speech signals were very robust to speech recognition errors, much more than the lexical and part-of-speech based features.

KW - Intelligent tutoring system

KW - Spoken language processing

KW - User affect recognition

UR - http://www.scopus.com/inward/record.url?scp=33646257071&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33646257071&partnerID=8YFLogxK

U2 - 10.1016/j.specom.2005.09.006

DO - 10.1016/j.specom.2005.09.006

M3 - Article

AN - SCOPUS:33646257071

VL - 48

SP - 616

EP - 632

JO - Speech Communication

JF - Speech Communication

SN - 0167-6393

IS - 6

ER -