Audio-visual affect recognition through multi-stream fused HMM for HCI

Zhihong Zeng, Jilin Tu, Brian Pianfetti, Ming Liu, Tong Zhang, Zhenqiu Zhang, Thomas S Huang, Stephen E Levinson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Advances in computer processing power and emerging algorithms are allowing new ways of envisioning Human Computer Interaction. This paper focuses on the development of a computing algorithm that uses audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our Multi-stream Fused Hidden Markov Model (MFHMM), we analyzed coupled audio and visual streams to detect 11 cognitive/emotive states. The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach performs with an accuracy of 80.61% which outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion.

Original languageEnglish (US)
Title of host publicationProceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005
PublisherIEEE Computer Society
Pages967-972
Number of pages6
ISBN (Print)0769523722, 9780769523729
DOIs
StatePublished - Jan 1 2005
Event2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005 - San Diego, CA, United States
Duration: Jun 20 2005Jun 25 2005

Publication series

NameProceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005
VolumeII

Other

Other2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005
CountryUnited States
CitySan Diego, CA
Period6/20/056/25/05

Fingerprint

Hidden Markov models
Human computer interaction
Entropy
Fusion reactions
Decision making
Sensors
Processing

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Zeng, Z., Tu, J., Pianfetti, B., Liu, M., Zhang, T., Zhang, Z., ... Levinson, S. E. (2005). Audio-visual affect recognition through multi-stream fused HMM for HCI. In Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005 (pp. 967-972). [1467547] (Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005; Vol. II). IEEE Computer Society. https://doi.org/10.1109/CVPR.2005.77

Audio-visual affect recognition through multi-stream fused HMM for HCI. / Zeng, Zhihong; Tu, Jilin; Pianfetti, Brian; Liu, Ming; Zhang, Tong; Zhang, Zhenqiu; Huang, Thomas S; Levinson, Stephen E.

Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005. IEEE Computer Society, 2005. p. 967-972 1467547 (Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005; Vol. II).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Zeng, Z, Tu, J, Pianfetti, B, Liu, M, Zhang, T, Zhang, Z, Huang, TS & Levinson, SE 2005, Audio-visual affect recognition through multi-stream fused HMM for HCI. in Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005., 1467547, Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, vol. II, IEEE Computer Society, pp. 967-972, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005, San Diego, CA, United States, 6/20/05. https://doi.org/10.1109/CVPR.2005.77
Zeng Z, Tu J, Pianfetti B, Liu M, Zhang T, Zhang Z et al. Audio-visual affect recognition through multi-stream fused HMM for HCI. In Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005. IEEE Computer Society. 2005. p. 967-972. 1467547. (Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005). https://doi.org/10.1109/CVPR.2005.77
Zeng, Zhihong ; Tu, Jilin ; Pianfetti, Brian ; Liu, Ming ; Zhang, Tong ; Zhang, Zhenqiu ; Huang, Thomas S ; Levinson, Stephen E. / Audio-visual affect recognition through multi-stream fused HMM for HCI. Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005. IEEE Computer Society, 2005. pp. 967-972 (Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005).
@inproceedings{409fda99c88f41419c7625620dd78ea5,
title = "Audio-visual affect recognition through multi-stream fused HMM for HCI",
abstract = "Advances in computer processing power and emerging algorithms are allowing new ways of envisioning Human Computer Interaction. This paper focuses on the development of a computing algorithm that uses audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our Multi-stream Fused Hidden Markov Model (MFHMM), we analyzed coupled audio and visual streams to detect 11 cognitive/emotive states. The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach performs with an accuracy of 80.61{\%} which outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion.",
author = "Zhihong Zeng and Jilin Tu and Brian Pianfetti and Ming Liu and Tong Zhang and Zhenqiu Zhang and Huang, {Thomas S} and Levinson, {Stephen E}",
year = "2005",
month = "1",
day = "1",
doi = "10.1109/CVPR.2005.77",
language = "English (US)",
isbn = "0769523722",
series = "Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005",
publisher = "IEEE Computer Society",
pages = "967--972",
booktitle = "Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005",

}

TY - GEN

T1 - Audio-visual affect recognition through multi-stream fused HMM for HCI

AU - Zeng, Zhihong

AU - Tu, Jilin

AU - Pianfetti, Brian

AU - Liu, Ming

AU - Zhang, Tong

AU - Zhang, Zhenqiu

AU - Huang, Thomas S

AU - Levinson, Stephen E

PY - 2005/1/1

Y1 - 2005/1/1

N2 - Advances in computer processing power and emerging algorithms are allowing new ways of envisioning Human Computer Interaction. This paper focuses on the development of a computing algorithm that uses audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our Multi-stream Fused Hidden Markov Model (MFHMM), we analyzed coupled audio and visual streams to detect 11 cognitive/emotive states. The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach performs with an accuracy of 80.61% which outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion.

AB - Advances in computer processing power and emerging algorithms are allowing new ways of envisioning Human Computer Interaction. This paper focuses on the development of a computing algorithm that uses audio and visual sensors to detect and track a user's affective state to aid computer decision making. Using our Multi-stream Fused Hidden Markov Model (MFHMM), we analyzed coupled audio and visual streams to detect 11 cognitive/emotive states. The MFHMM allows the building of an optimal connection among multiple streams according to the maximum entropy principle and the maximum mutual information criterion. Person-independent experimental results from 20 subjects in 660 sequences show that the MFHMM approach performs with an accuracy of 80.61% which outperforms face-only HMM, pitch-only HMM, energy-only HMM, and independent HMM fusion.

UR - http://www.scopus.com/inward/record.url?scp=24644432083&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=24644432083&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2005.77

DO - 10.1109/CVPR.2005.77

M3 - Conference contribution

AN - SCOPUS:24644432083

SN - 0769523722

SN - 9780769523729

T3 - Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005

SP - 967

EP - 972

BT - Proceedings - 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005

PB - IEEE Computer Society

ER -