Multi-modal audio, video and physiological sensor learning for continuous emotion prediction

Kevin Brady, Youngjune Gwon, Pooya Khorrami, Elizabeth Godoy, William Campbell, Charlie Dagli, Thomas S. Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The automatic determination of emotional state from multimedia content is an inherently challenging problem with a broad range of applications including biomedical diagnostics, multimedia retrieval, and human computer interfaces. The Audio Video Emotion Challenge (AVEC) 2016 provides a well-defined framework for developing and rigorously evaluating innovative approaches for estimating the arousal and valence states of emotion as a function of time. It presents the opportunity for investigating multimodal solutions that include audio, video, and physiological sensor signals. This paper provides an overview of our AVEC Emotion Challenge system, which uses multi-feature learning and fusion across all available modalities. It includes a number of technical contributions, including the development of novel high- and low-level features for modeling emotion in the audio, video, and physiological channels. Low-level features include modeling arousal in audio with minimal prosodic-based descriptors. High-level features are derived from supervised and unsupervised machine learning approaches based on sparse coding and deep learning. Finally, a state space estimation approach is applied for score fusion that demonstrates the importance of exploiting the time-series nature of the arousal and valence states. The resulting system outperforms the baseline systems [10] on the test evaluation set with an achieved Concordant Correlation Coefficient (CCC) for arousal of 0.770 vs 0.702 (baseline) and for valence of 0.687 vs 0.638. Future work will focus on exploiting the time-varying nature of individual channels in the multi-modal framework.

Original languageEnglish (US)
Title of host publicationAVEC 2016 - Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, co-located with ACM Multimedia 2016
PublisherAssociation for Computing Machinery
Pages97-104
Number of pages8
ISBN (Electronic)9781450345163
DOIs
StatePublished - Oct 16 2016
Event6th International Workshop on Audio/Visual Emotion Challenge, AVEC 2016 - Amsterdam, Netherlands
Duration: Oct 16 2016 → …

Publication series

NameAVEC 2016 - Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge, co-located with ACM Multimedia 2016

Other

Other6th International Workshop on Audio/Visual Emotion Challenge, AVEC 2016
Country/TerritoryNetherlands
CityAmsterdam
Period10/16/16 → …

Keywords

  • Affective Computing
  • CNN
  • Challenge
  • Deep Learning
  • Emotion Recognition
  • Facial Expression
  • Sparse Coding
  • Speech

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Software
  • Computer Vision and Pattern Recognition
  • Computer Graphics and Computer-Aided Design

Fingerprint

Dive into the research topics of 'Multi-modal audio, video and physiological sensor learning for continuous emotion prediction'. Together they form a unique fingerprint.

Cite this