Understanding the dynamics of social interactions: A multi-modal multi-view approach

Rim Trabelsi, Jagannadan Varadarajan, Le Zhang, Issam Jabri, Yong Pei, Fethi Smach, Ammar Bouallegue, Pierre Moulin

Research output: Contribution to journalArticlepeer-review


In this article, we deal with the problem of understanding human-to-human interactions as a fundamental component of social events analysis. Inspired by the recent success of multi-modal visual data in many recognition tasks, we propose a novel approach to model dyadic interaction by means of features extracted from synchronized 3D skeleton coordinates, depth, and Red Green Blue (RGB) sequences. From skeleton data, we extract new view-invariant proxemic features, named Unified Proxemic Descriptor (UProD), which is able to incorporate intrinsic and extrinsic distances between two interacting subjects. A novel key frame selection method is introduced to identify salient instants of the interaction sequence based on the joints' energy. From Red Green Blue Depth (RGBD) videos, more holistic CNN features are extracted by applying an adaptive pre-trained Convolutional Neural Networks (CNNs) on optical flow frames. For better understanding the dynamics of interactions, we expand the boundaries of dyadic interactions analysis by proposing a fundamentally new modeling for non-treated problem aiming to discern the active from the passive interactor. Extensive experiments have been carried out on four multi-modal and multi-view interactions datasets. The experimental results demonstrate the superiority of our proposed techniques against the state-of-the-art approaches.

Original languageEnglish (US)
Article number15
JournalACM Transactions on Multimedia Computing, Communications and Applications
Issue number1s
StatePublished - Feb 2019


  • Active/passive subjects
  • CNN
  • Interaction recognition
  • Multi-modal data
  • RGBD
  • Skeleton

ASJC Scopus subject areas

  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'Understanding the dynamics of social interactions: A multi-modal multi-view approach'. Together they form a unique fingerprint.

Cite this