TY - GEN
T1 - Cross-dataset action detection
AU - Cao, Liangliang
AU - Liu, Zicheng
AU - Huang, Thomas S.
PY - 2010
Y1 - 2010
N2 - In recent years, many research works have been carried out to recognize human actions from video clips. To learn an effective action classifier, most of the previous approaches rely on enough training labels. When being required to recognize the action in a different dataset, these approaches have to re-train the model using new labels. However, labeling video sequences is a very tedious and time-consuming task, especially when detailed spatial locations and time durations are required. In this paper, we propose an adaptive action detection approach which reduces the requirement of training labels and is able to handle the task of cross-dataset action detection with few or no extra training labels. Our approach combines model adaptation and action detection into a Maximum a Posterior (MAP) estimation framework, which explores the spatialtemporal coherence of actions and makes good use of the prior information which can be obtained without supervision. Our approach obtains state-of-the-art results on KTH action dataset using only 50% of the training labels in tradition approaches. Furthermore, we show that our approach is effective for the cross-dataset detection which adapts the model trained on KTH to two other challenging datasets.
AB - In recent years, many research works have been carried out to recognize human actions from video clips. To learn an effective action classifier, most of the previous approaches rely on enough training labels. When being required to recognize the action in a different dataset, these approaches have to re-train the model using new labels. However, labeling video sequences is a very tedious and time-consuming task, especially when detailed spatial locations and time durations are required. In this paper, we propose an adaptive action detection approach which reduces the requirement of training labels and is able to handle the task of cross-dataset action detection with few or no extra training labels. Our approach combines model adaptation and action detection into a Maximum a Posterior (MAP) estimation framework, which explores the spatialtemporal coherence of actions and makes good use of the prior information which can be obtained without supervision. Our approach obtains state-of-the-art results on KTH action dataset using only 50% of the training labels in tradition approaches. Furthermore, we show that our approach is effective for the cross-dataset detection which adapts the model trained on KTH to two other challenging datasets.
UR - http://www.scopus.com/inward/record.url?scp=77955989314&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955989314&partnerID=8YFLogxK
U2 - 10.1109/CVPR.2010.5539875
DO - 10.1109/CVPR.2010.5539875
M3 - Conference contribution
AN - SCOPUS:77955989314
SN - 9781424469840
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 1998
EP - 2005
BT - 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
T2 - 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2010
Y2 - 13 June 2010 through 18 June 2010
ER -