TY - JOUR
T1 - Multiple Granularity Modeling
T2 - A Coarse-to-Fine Framework for Fine-grained Action Analysis
AU - Ni, Bingbing
AU - Paramathayalan, Vignesh R.
AU - Li, Teng
AU - Moulin, Pierre
N1 - Publisher Copyright:
© 2016, Springer Science+Business Media New York.
PY - 2016/10/1
Y1 - 2016/10/1
N2 - Detecting fine-grained human action from video sequence is challenging. In this work, we propose to decompose this difficult analytic problem into two sequential tasks with increasing granularity. Firstly, we infer the coarse interaction status, i.e., which object is being manipulated and where the interaction occurs. To address the issue of frequent mutual occlusions during manipulation, we propose an interaction tracking framework in which hand (object) position and interaction status are jointly tracked by explicitly modeling the occlusion context. Secondly, for a given query sequence, the inferred interaction status is utilized to efficiently identify a small set of candidate matching sequences from the annotated training set. Frame-level action labels are then transferred to the query sequence by setting up the matching between the query and candidate sequences. Comprehensive experiments on two challenging fine-grained activity datasets show that: (1) the proposed interaction tracking approach achieves high tracking accuracy for multiple mutually occluded objects (hands) during manipulation action; and (2) the proposed multiple granularity analysis framework achieves superior action detection performance improvement over state-of-the-art methods.
AB - Detecting fine-grained human action from video sequence is challenging. In this work, we propose to decompose this difficult analytic problem into two sequential tasks with increasing granularity. Firstly, we infer the coarse interaction status, i.e., which object is being manipulated and where the interaction occurs. To address the issue of frequent mutual occlusions during manipulation, we propose an interaction tracking framework in which hand (object) position and interaction status are jointly tracked by explicitly modeling the occlusion context. Secondly, for a given query sequence, the inferred interaction status is utilized to efficiently identify a small set of candidate matching sequences from the annotated training set. Frame-level action labels are then transferred to the query sequence by setting up the matching between the query and candidate sequences. Comprehensive experiments on two challenging fine-grained activity datasets show that: (1) the proposed interaction tracking approach achieves high tracking accuracy for multiple mutually occluded objects (hands) during manipulation action; and (2) the proposed multiple granularity analysis framework achieves superior action detection performance improvement over state-of-the-art methods.
KW - Fine-grained action detection
KW - Multiple granularity
KW - Multiple object tracking
KW - Nonparametric label transfer
UR - http://www.scopus.com/inward/record.url?scp=84959330850&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959330850&partnerID=8YFLogxK
U2 - 10.1007/s11263-016-0891-8
DO - 10.1007/s11263-016-0891-8
M3 - Article
AN - SCOPUS:84959330850
SN - 0920-5691
VL - 120
SP - 28
EP - 43
JO - International Journal of Computer Vision
JF - International Journal of Computer Vision
IS - 1
ER -