TY - GEN
T1 - Multiple granularity analysis for fine-grained action detection
AU - Ni, Bingbing
AU - Paramathayalan, Vignesh R.
AU - Moulin, Pierre
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/9/24
Y1 - 2014/9/24
N2 - We propose to decompose the fine-grained human activ- ity analysis problem into two sequential tasks with increas- ing granularity. Firstly, we infer the coarse interaction sta- tus, i.e., which object is being manipulated and where it is. Knowing that the major challenge is frequent mutual oc- clusions during manipulation, we propose an 'interaction tracking' framework in which hand/object position and in- teraction status are jointly tracked by explicitly modeling the contextual information between mutual occlusion and interaction status. Secondly, the inferred hand/object posi- tion and interaction status are utilized to provide 1) more compact feature pooling by effectively pruning large num- ber of motion features from irrelevant spatio-temporal po- sitions and 2) discriminative action detection by a granu- larity fusion strategy. Comprehensive experiments on two challenging fine-grained activity datasets (i.e., cooking ac- tion) show that the proposed framework achieves high ac- curacy/robustness in tracking multiple mutually occluded hands/objects during manipulation as well as significant performance improvement on fine-grained action detection over state-of-the-art methods.
AB - We propose to decompose the fine-grained human activ- ity analysis problem into two sequential tasks with increas- ing granularity. Firstly, we infer the coarse interaction sta- tus, i.e., which object is being manipulated and where it is. Knowing that the major challenge is frequent mutual oc- clusions during manipulation, we propose an 'interaction tracking' framework in which hand/object position and in- teraction status are jointly tracked by explicitly modeling the contextual information between mutual occlusion and interaction status. Secondly, the inferred hand/object posi- tion and interaction status are utilized to provide 1) more compact feature pooling by effectively pruning large num- ber of motion features from irrelevant spatio-temporal po- sitions and 2) discriminative action detection by a granu- larity fusion strategy. Comprehensive experiments on two challenging fine-grained activity datasets (i.e., cooking ac- tion) show that the proposed framework achieves high ac- curacy/robustness in tracking multiple mutually occluded hands/objects during manipulation as well as significant performance improvement on fine-grained action detection over state-of-the-art methods.
KW - action detection
KW - interaction tracking
KW - multiple granularity
UR - http://www.scopus.com/inward/record.url?scp=84911397627&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84911397627&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2014.102
DO - 10.1109/CVPR.2014.102
M3 - Conference contribution
AN - SCOPUS:84911397627
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 756
EP - 763
BT - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
PB - IEEE Computer Society
T2 - 27th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014
Y2 - 23 June 2014 through 28 June 2014
ER -