TY - GEN
T1 - SIFT-bag kernel for video event analysis
AU - Zhou, Xi
AU - Zhuang, Xiaodan
AU - Yan, Shuicheng
AU - Chang, Shih Fu
AU - Hasegawa-Johnson, Mark
AU - Huang, Thomas S.
PY - 2008
Y1 - 2008
N2 - In this work, we present a SIFT-Bag based generative-todiscriminative framework for addressing the problem of video event recognition in unconstrained news videos. In the generative stage, each video clip is encoded as a bag of SIFT feature vectors, the distribution of which is described by a Gaussian Mixture Models (GMM). In the discriminative stage, the SIFT-Bag Kernel is designed for characterizing the property of Kullback-Leibler divergence between the specialized GMMs of any two video clips, and then this kernel is utilized for supervised learning in two ways. On one hand, this kernel is further refined in discriminating power for centroid-based video event classification by using the Within-Class Covariance Normalization approach, which depresses the kernel components with high-variability for video clips of the same event. On the other hand, the SIFT-Bag Kernel is used in a Support VectorMachine for margin-based video event classification. Finally, the outputs from these two classifiers are fused together for final decision. The experiments on the TRECVID 2005 corpus demonstrate that the mean average precision is boosted from the best reported 38.2% in [36] to 60.4% based on our new framework.
AB - In this work, we present a SIFT-Bag based generative-todiscriminative framework for addressing the problem of video event recognition in unconstrained news videos. In the generative stage, each video clip is encoded as a bag of SIFT feature vectors, the distribution of which is described by a Gaussian Mixture Models (GMM). In the discriminative stage, the SIFT-Bag Kernel is designed for characterizing the property of Kullback-Leibler divergence between the specialized GMMs of any two video clips, and then this kernel is utilized for supervised learning in two ways. On one hand, this kernel is further refined in discriminating power for centroid-based video event classification by using the Within-Class Covariance Normalization approach, which depresses the kernel components with high-variability for video clips of the same event. On the other hand, the SIFT-Bag Kernel is used in a Support VectorMachine for margin-based video event classification. Finally, the outputs from these two classifiers are fused together for final decision. The experiments on the TRECVID 2005 corpus demonstrate that the mean average precision is boosted from the best reported 38.2% in [36] to 60.4% based on our new framework.
KW - Kernel design
KW - SIFT-bag
KW - Video event recognition
KW - Within-class covariation normalization
UR - http://www.scopus.com/inward/record.url?scp=70350676914&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350676914&partnerID=8YFLogxK
U2 - 10.1145/1459359.1459391
DO - 10.1145/1459359.1459391
M3 - Conference contribution
AN - SCOPUS:70350676914
SN - 9781605583037
T3 - MM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops
SP - 229
EP - 238
BT - MM'08 - Proceedings of the 2008 ACM International Conference on Multimedia, with co-located Symposium and Workshops
T2 - 16th ACM International Conference on Multimedia, MM '08
Y2 - 26 October 2008 through 31 October 2008
ER -