Audio-visual affect recognition

Zhihong Zeng, Jilin Tu, Ming Liu, Thomas S. Huang, Brian Pianfetti, Dan Roth, Stephen Levinson

Research output: Contribution to journalArticle

Abstract

The ability of a computer to detect and appropriately respond to changes in a user's affective state has significant implications to Human-Computer Interaction (HCI). In this paper, we present our efforts toward audio-visual affect recognition on 11 affective states customized for HCI application (four cognitive/motivational and seven basic affective states) of 20 nonactor subjects. A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition. The feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking. For person-dependent recognition, we apply the voting method to combine the frame-based classification results from both audio and visual channels. The result shows 7.5% improvement over the best unimodal performance. For person-independent test, we apply multistream HMM to combine the information from multiple component streams. This test shows 6.1% improvement over the best component performance.

Original languageEnglish (US)
Pages (from-to)424-428
Number of pages5
JournalIEEE Transactions on Multimedia
Volume9
Issue number2
DOIs
StatePublished - Feb 1 2007

Fingerprint

Human computer interaction
Feature extraction

Keywords

  • Affect recognition
  • Affective computing
  • Emotion recognition
  • Multimodal human-computer interaction

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering

Cite this

Audio-visual affect recognition. / Zeng, Zhihong; Tu, Jilin; Liu, Ming; Huang, Thomas S.; Pianfetti, Brian; Roth, Dan; Levinson, Stephen.

In: IEEE Transactions on Multimedia, Vol. 9, No. 2, 01.02.2007, p. 424-428.

Research output: Contribution to journalArticle

Zeng, Z, Tu, J, Liu, M, Huang, TS, Pianfetti, B, Roth, D & Levinson, S 2007, 'Audio-visual affect recognition', IEEE Transactions on Multimedia, vol. 9, no. 2, pp. 424-428. https://doi.org/10.1109/TMM.2006.886310
Zeng Z, Tu J, Liu M, Huang TS, Pianfetti B, Roth D et al. Audio-visual affect recognition. IEEE Transactions on Multimedia. 2007 Feb 1;9(2):424-428. https://doi.org/10.1109/TMM.2006.886310
Zeng, Zhihong ; Tu, Jilin ; Liu, Ming ; Huang, Thomas S. ; Pianfetti, Brian ; Roth, Dan ; Levinson, Stephen. / Audio-visual affect recognition. In: IEEE Transactions on Multimedia. 2007 ; Vol. 9, No. 2. pp. 424-428.
@article{bb9905ca2c3147789bfd9d4548fdc9c5,
title = "Audio-visual affect recognition",
abstract = "The ability of a computer to detect and appropriately respond to changes in a user's affective state has significant implications to Human-Computer Interaction (HCI). In this paper, we present our efforts toward audio-visual affect recognition on 11 affective states customized for HCI application (four cognitive/motivational and seven basic affective states) of 20 nonactor subjects. A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition. The feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking. For person-dependent recognition, we apply the voting method to combine the frame-based classification results from both audio and visual channels. The result shows 7.5{\%} improvement over the best unimodal performance. For person-independent test, we apply multistream HMM to combine the information from multiple component streams. This test shows 6.1{\%} improvement over the best component performance.",
keywords = "Affect recognition, Affective computing, Emotion recognition, Multimodal human-computer interaction",
author = "Zhihong Zeng and Jilin Tu and Ming Liu and Huang, {Thomas S.} and Brian Pianfetti and Dan Roth and Stephen Levinson",
year = "2007",
month = "2",
day = "1",
doi = "10.1109/TMM.2006.886310",
language = "English (US)",
volume = "9",
pages = "424--428",
journal = "IEEE Transactions on Multimedia",
issn = "1520-9210",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "2",

}

TY - JOUR

T1 - Audio-visual affect recognition

AU - Zeng, Zhihong

AU - Tu, Jilin

AU - Liu, Ming

AU - Huang, Thomas S.

AU - Pianfetti, Brian

AU - Roth, Dan

AU - Levinson, Stephen

PY - 2007/2/1

Y1 - 2007/2/1

N2 - The ability of a computer to detect and appropriately respond to changes in a user's affective state has significant implications to Human-Computer Interaction (HCI). In this paper, we present our efforts toward audio-visual affect recognition on 11 affective states customized for HCI application (four cognitive/motivational and seven basic affective states) of 20 nonactor subjects. A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition. The feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking. For person-dependent recognition, we apply the voting method to combine the frame-based classification results from both audio and visual channels. The result shows 7.5% improvement over the best unimodal performance. For person-independent test, we apply multistream HMM to combine the information from multiple component streams. This test shows 6.1% improvement over the best component performance.

AB - The ability of a computer to detect and appropriately respond to changes in a user's affective state has significant implications to Human-Computer Interaction (HCI). In this paper, we present our efforts toward audio-visual affect recognition on 11 affective states customized for HCI application (four cognitive/motivational and seven basic affective states) of 20 nonactor subjects. A smoothing method is proposed to reduce the detrimental influence of speech on facial expression recognition. The feature selection analysis shows that subjects are prone to use brow movement in face, pitch and energy in prosody to express their affects while speaking. For person-dependent recognition, we apply the voting method to combine the frame-based classification results from both audio and visual channels. The result shows 7.5% improvement over the best unimodal performance. For person-independent test, we apply multistream HMM to combine the information from multiple component streams. This test shows 6.1% improvement over the best component performance.

KW - Affect recognition

KW - Affective computing

KW - Emotion recognition

KW - Multimodal human-computer interaction

UR - http://www.scopus.com/inward/record.url?scp=33846592328&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33846592328&partnerID=8YFLogxK

U2 - 10.1109/TMM.2006.886310

DO - 10.1109/TMM.2006.886310

M3 - Article

AN - SCOPUS:33846592328

VL - 9

SP - 424

EP - 428

JO - IEEE Transactions on Multimedia

JF - IEEE Transactions on Multimedia

SN - 1520-9210

IS - 2

ER -