TY - GEN
T1 - Integrating multi-stage depth-induced contextual information for human action recognition and localization
AU - Ni, Bingbing
AU - Pei, Yong
AU - Liang, Zhujin
AU - Lin, Liang
AU - Moulin, Pierre
PY - 2013/8/20
Y1 - 2013/8/20
N2 - Human action recognition and localization is a challenging vision task with promising applications. To tackle this problem, recently developed commodity depth sensor (e.g., Microsoft Kinect) has opened up new opportunities with several developed human motion features based on depth image for action representation. However, how depth information can be effectively adopted in the middle or high level representation in action detection, in particular, the depth induced three dimensional contextual information for modeling interactions between human-human, human-object and human-surroundings has yet been explored. In this paper, we propose a novel action recognition and localization framework which effectively fuses depth-induced contextual information from different levels of the processing pipeline for understanding various interactions. First, depth image is combined with grayscale image for more robust human subject and object detection. Second, three dimensional spatial and temporal relationship among human subjects or objects is represented based on the combination of grayscale and depth images. Third, depth information is further utilized to represent different types of indoor scenes. Finally, we fuse these multiple stage depth-induced contextual information to yield an unified action detection framework. Extensive experiments on a challenging grayscale + depth human action detection benchmark database demonstrate the effectiveness of the depth-induced contextual information and the high detection accuracy of the proposed framework.
AB - Human action recognition and localization is a challenging vision task with promising applications. To tackle this problem, recently developed commodity depth sensor (e.g., Microsoft Kinect) has opened up new opportunities with several developed human motion features based on depth image for action representation. However, how depth information can be effectively adopted in the middle or high level representation in action detection, in particular, the depth induced three dimensional contextual information for modeling interactions between human-human, human-object and human-surroundings has yet been explored. In this paper, we propose a novel action recognition and localization framework which effectively fuses depth-induced contextual information from different levels of the processing pipeline for understanding various interactions. First, depth image is combined with grayscale image for more robust human subject and object detection. Second, three dimensional spatial and temporal relationship among human subjects or objects is represented based on the combination of grayscale and depth images. Third, depth information is further utilized to represent different types of indoor scenes. Finally, we fuse these multiple stage depth-induced contextual information to yield an unified action detection framework. Extensive experiments on a challenging grayscale + depth human action detection benchmark database demonstrate the effectiveness of the depth-induced contextual information and the high detection accuracy of the proposed framework.
UR - http://www.scopus.com/inward/record.url?scp=84881515103&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84881515103&partnerID=8YFLogxK
U2 - 10.1109/FG.2013.6553756
DO - 10.1109/FG.2013.6553756
M3 - Conference contribution
AN - SCOPUS:84881515103
SN - 9781467355452
T3 - 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013
BT - 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013
T2 - 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition, FG 2013
Y2 - 22 April 2013 through 26 April 2013
ER -