TY - GEN
T1 - Pipelining localized semantic features for fine-grained action recognition
AU - Zhou, Yang
AU - Ni, Bingbing
AU - Yan, Shuicheng
AU - Moulin, Pierre
AU - Tian, Qi
PY - 2014
Y1 - 2014
N2 - In fine-grained action (object manipulation) recognition, it is important to encode object semantic (contextual) information, i.e., which object is being manipulated and how it is being operated. However, previous methods for action recognition often represent the semantic information in a global and coarse way and therefore cannot cope with fine-grained actions. In this work, we propose a representation and classification pipeline which seamlessly incorporates localized semantic information into every processing step for fine-grained action recognition. In the feature extraction stage, we explore the geometric information between local motion features and the surrounding objects. In the feature encoding stage, we develop a semantic-grouped locality-constrained linear coding (SG-LLC) method that captures the joint distributions between motion and object-in-use information. Finally, we propose a semantic-aware multiple kernel learning framework (SA-MKL) by utilizing the empirical joint distribution between action and object type for more discriminative action classification. Extensive experiments are performed on the large-scale and difficult fine-grained MPII cooking action dataset. The results show that by effectively accumulating localized semantic information into the action representation and classification pipeline, we significantly improve the fine-grained action classification performance over the existing methods.
AB - In fine-grained action (object manipulation) recognition, it is important to encode object semantic (contextual) information, i.e., which object is being manipulated and how it is being operated. However, previous methods for action recognition often represent the semantic information in a global and coarse way and therefore cannot cope with fine-grained actions. In this work, we propose a representation and classification pipeline which seamlessly incorporates localized semantic information into every processing step for fine-grained action recognition. In the feature extraction stage, we explore the geometric information between local motion features and the surrounding objects. In the feature encoding stage, we develop a semantic-grouped locality-constrained linear coding (SG-LLC) method that captures the joint distributions between motion and object-in-use information. Finally, we propose a semantic-aware multiple kernel learning framework (SA-MKL) by utilizing the empirical joint distribution between action and object type for more discriminative action classification. Extensive experiments are performed on the large-scale and difficult fine-grained MPII cooking action dataset. The results show that by effectively accumulating localized semantic information into the action representation and classification pipeline, we significantly improve the fine-grained action classification performance over the existing methods.
UR - http://www.scopus.com/inward/record.url?scp=84906501135&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906501135&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-10593-2_32
DO - 10.1007/978-3-319-10593-2_32
M3 - Conference contribution
AN - SCOPUS:84906501135
SN - 9783319105925
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 481
EP - 496
BT - Computer Vision, ECCV 2014 - 13th European Conference, Proceedings
PB - Springer
T2 - 13th European Conference on Computer Vision, ECCV 2014
Y2 - 6 September 2014 through 12 September 2014
ER -