Multiple Granularity Modeling: A Coarse-to-Fine Framework for Fine-grained Action Analysis

Bingbing Ni, Vignesh R. Paramathayalan, Teng Li, Pierre Moulin

Research output: Contribution to journalArticlepeer-review

Abstract

Detecting fine-grained human action from video sequence is challenging. In this work, we propose to decompose this difficult analytic problem into two sequential tasks with increasing granularity. Firstly, we infer the coarse interaction status, i.e., which object is being manipulated and where the interaction occurs. To address the issue of frequent mutual occlusions during manipulation, we propose an interaction tracking framework in which hand (object) position and interaction status are jointly tracked by explicitly modeling the occlusion context. Secondly, for a given query sequence, the inferred interaction status is utilized to efficiently identify a small set of candidate matching sequences from the annotated training set. Frame-level action labels are then transferred to the query sequence by setting up the matching between the query and candidate sequences. Comprehensive experiments on two challenging fine-grained activity datasets show that: (1) the proposed interaction tracking approach achieves high tracking accuracy for multiple mutually occluded objects (hands) during manipulation action; and (2) the proposed multiple granularity analysis framework achieves superior action detection performance improvement over state-of-the-art methods.

Original languageEnglish (US)
Pages (from-to)28-43
Number of pages16
JournalInternational Journal of Computer Vision
Volume120
Issue number1
DOIs
StatePublished - Oct 1 2016

Keywords

  • Fine-grained action detection
  • Multiple granularity
  • Multiple object tracking
  • Nonparametric label transfer

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Multiple Granularity Modeling: A Coarse-to-Fine Framework for Fine-grained Action Analysis'. Together they form a unique fingerprint.

Cite this