TY - JOUR
T1 - Understanding the dynamics of social interactions
T2 - A multi-modal multi-view approach
AU - Trabelsi, Rim
AU - Varadarajan, Jagannadan
AU - Zhang, Le
AU - Jabri, Issam
AU - Pei, Yong
AU - Smach, Fethi
AU - Bouallegue, Ammar
AU - Moulin, Pierre
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/2
Y1 - 2019/2
N2 - In this article, we deal with the problem of understanding human-to-human interactions as a fundamental component of social events analysis. Inspired by the recent success of multi-modal visual data in many recognition tasks, we propose a novel approach to model dyadic interaction by means of features extracted from synchronized 3D skeleton coordinates, depth, and Red Green Blue (RGB) sequences. From skeleton data, we extract new view-invariant proxemic features, named Unified Proxemic Descriptor (UProD), which is able to incorporate intrinsic and extrinsic distances between two interacting subjects. A novel key frame selection method is introduced to identify salient instants of the interaction sequence based on the joints' energy. From Red Green Blue Depth (RGBD) videos, more holistic CNN features are extracted by applying an adaptive pre-trained Convolutional Neural Networks (CNNs) on optical flow frames. For better understanding the dynamics of interactions, we expand the boundaries of dyadic interactions analysis by proposing a fundamentally new modeling for non-treated problem aiming to discern the active from the passive interactor. Extensive experiments have been carried out on four multi-modal and multi-view interactions datasets. The experimental results demonstrate the superiority of our proposed techniques against the state-of-the-art approaches.
AB - In this article, we deal with the problem of understanding human-to-human interactions as a fundamental component of social events analysis. Inspired by the recent success of multi-modal visual data in many recognition tasks, we propose a novel approach to model dyadic interaction by means of features extracted from synchronized 3D skeleton coordinates, depth, and Red Green Blue (RGB) sequences. From skeleton data, we extract new view-invariant proxemic features, named Unified Proxemic Descriptor (UProD), which is able to incorporate intrinsic and extrinsic distances between two interacting subjects. A novel key frame selection method is introduced to identify salient instants of the interaction sequence based on the joints' energy. From Red Green Blue Depth (RGBD) videos, more holistic CNN features are extracted by applying an adaptive pre-trained Convolutional Neural Networks (CNNs) on optical flow frames. For better understanding the dynamics of interactions, we expand the boundaries of dyadic interactions analysis by proposing a fundamentally new modeling for non-treated problem aiming to discern the active from the passive interactor. Extensive experiments have been carried out on four multi-modal and multi-view interactions datasets. The experimental results demonstrate the superiority of our proposed techniques against the state-of-the-art approaches.
KW - Active/passive subjects
KW - CNN
KW - Interaction recognition
KW - Multi-modal data
KW - RGBD
KW - Skeleton
UR - http://www.scopus.com/inward/record.url?scp=85062342616&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85062342616&partnerID=8YFLogxK
U2 - 10.1145/3300937
DO - 10.1145/3300937
M3 - Article
AN - SCOPUS:85062342616
SN - 1551-6857
VL - 15
JO - ACM Transactions on Multimedia Computing, Communications and Applications
JF - ACM Transactions on Multimedia Computing, Communications and Applications
IS - 1s
M1 - 15
ER -