Joint Estimation of Human Pose and Conversational Groups from Social Scenes

Jagannadan Varadarajan, Ramanathan Subramanian, Samuel Rota Bulò, Narendra Ahuja, Oswald Lanz, Elisa Ricci

Research output: Contribution to journalArticle

Abstract

Despite many attempts in the last few years, automatic analysis of social scenes captured by wide-angle camera networks remains a very challenging task due to the low resolution of targets, background clutter and frequent and persistent occlusions. In this paper, we present a novel framework for jointly estimating (i) head, body orientations of targets and (ii) conversational groups called F-formations from social scenes. In contrast to prior works that have (a) exploited the limited range of head and body orientations to jointly learn both, or (b) employed the mutual head (but not body) pose of interactors for deducing F-formations, we propose a weakly-supervised learning algorithm for joint inference. Our algorithm employs body pose as the primary cue for F-formation estimation, and an alternating optimization strategy is proposed to iteratively refine F-formation and pose estimates. We demonstrate the increased efficacy of joint inference over the state-of-the-art via extensive experiments on three social datasets.

Original languageEnglish (US)
Pages (from-to)410-429
Number of pages20
JournalInternational Journal of Computer Vision
Volume126
Issue number2-4
DOIs
StatePublished - Apr 1 2018

Fingerprint

Supervised learning
Learning algorithms
Cameras
Experiments

Keywords

  • Conversational groups
  • Convex optimization
  • F-formation estimation
  • Head and body pose estimation
  • Semi-supervised learning
  • Video surveillance

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Cite this

Joint Estimation of Human Pose and Conversational Groups from Social Scenes. / Varadarajan, Jagannadan; Subramanian, Ramanathan; Bulò, Samuel Rota; Ahuja, Narendra; Lanz, Oswald; Ricci, Elisa.

In: International Journal of Computer Vision, Vol. 126, No. 2-4, 01.04.2018, p. 410-429.

Research output: Contribution to journalArticle

Varadarajan, Jagannadan ; Subramanian, Ramanathan ; Bulò, Samuel Rota ; Ahuja, Narendra ; Lanz, Oswald ; Ricci, Elisa. / Joint Estimation of Human Pose and Conversational Groups from Social Scenes. In: International Journal of Computer Vision. 2018 ; Vol. 126, No. 2-4. pp. 410-429.
@article{066bbbbf7bb14f518aaf9f0050e37188,
title = "Joint Estimation of Human Pose and Conversational Groups from Social Scenes",
abstract = "Despite many attempts in the last few years, automatic analysis of social scenes captured by wide-angle camera networks remains a very challenging task due to the low resolution of targets, background clutter and frequent and persistent occlusions. In this paper, we present a novel framework for jointly estimating (i) head, body orientations of targets and (ii) conversational groups called F-formations from social scenes. In contrast to prior works that have (a) exploited the limited range of head and body orientations to jointly learn both, or (b) employed the mutual head (but not body) pose of interactors for deducing F-formations, we propose a weakly-supervised learning algorithm for joint inference. Our algorithm employs body pose as the primary cue for F-formation estimation, and an alternating optimization strategy is proposed to iteratively refine F-formation and pose estimates. We demonstrate the increased efficacy of joint inference over the state-of-the-art via extensive experiments on three social datasets.",
keywords = "Conversational groups, Convex optimization, F-formation estimation, Head and body pose estimation, Semi-supervised learning, Video surveillance",
author = "Jagannadan Varadarajan and Ramanathan Subramanian and Bul{\`o}, {Samuel Rota} and Narendra Ahuja and Oswald Lanz and Elisa Ricci",
year = "2018",
month = "4",
day = "1",
doi = "10.1007/s11263-017-1026-6",
language = "English (US)",
volume = "126",
pages = "410--429",
journal = "International Journal of Computer Vision",
issn = "0920-5691",
publisher = "Springer Netherlands",
number = "2-4",

}

TY - JOUR

T1 - Joint Estimation of Human Pose and Conversational Groups from Social Scenes

AU - Varadarajan, Jagannadan

AU - Subramanian, Ramanathan

AU - Bulò, Samuel Rota

AU - Ahuja, Narendra

AU - Lanz, Oswald

AU - Ricci, Elisa

PY - 2018/4/1

Y1 - 2018/4/1

N2 - Despite many attempts in the last few years, automatic analysis of social scenes captured by wide-angle camera networks remains a very challenging task due to the low resolution of targets, background clutter and frequent and persistent occlusions. In this paper, we present a novel framework for jointly estimating (i) head, body orientations of targets and (ii) conversational groups called F-formations from social scenes. In contrast to prior works that have (a) exploited the limited range of head and body orientations to jointly learn both, or (b) employed the mutual head (but not body) pose of interactors for deducing F-formations, we propose a weakly-supervised learning algorithm for joint inference. Our algorithm employs body pose as the primary cue for F-formation estimation, and an alternating optimization strategy is proposed to iteratively refine F-formation and pose estimates. We demonstrate the increased efficacy of joint inference over the state-of-the-art via extensive experiments on three social datasets.

AB - Despite many attempts in the last few years, automatic analysis of social scenes captured by wide-angle camera networks remains a very challenging task due to the low resolution of targets, background clutter and frequent and persistent occlusions. In this paper, we present a novel framework for jointly estimating (i) head, body orientations of targets and (ii) conversational groups called F-formations from social scenes. In contrast to prior works that have (a) exploited the limited range of head and body orientations to jointly learn both, or (b) employed the mutual head (but not body) pose of interactors for deducing F-formations, we propose a weakly-supervised learning algorithm for joint inference. Our algorithm employs body pose as the primary cue for F-formation estimation, and an alternating optimization strategy is proposed to iteratively refine F-formation and pose estimates. We demonstrate the increased efficacy of joint inference over the state-of-the-art via extensive experiments on three social datasets.

KW - Conversational groups

KW - Convex optimization

KW - F-formation estimation

KW - Head and body pose estimation

KW - Semi-supervised learning

KW - Video surveillance

UR - http://www.scopus.com/inward/record.url?scp=85023748423&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023748423&partnerID=8YFLogxK

U2 - 10.1007/s11263-017-1026-6

DO - 10.1007/s11263-017-1026-6

M3 - Article

AN - SCOPUS:85023748423

VL - 126

SP - 410

EP - 429

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

SN - 0920-5691

IS - 2-4

ER -