Detecting human actions in surveillance videos

Ming Yang, Shuiwang Ji, Wei Xu, Jinjun Wang, Fengjun Lv, Kai Yu, Yihong Gong, Mert Dikmen, Dennis J. Lin, Thomas S Huang

Research output: Contribution to conferencePaper

Abstract

This notebook paper summarizes Team NEC-UIUC's approaches for TRECVid 2009 Evaluation of Surveillance Event Detection. Our submissions include two types of systems. One system employs the brute force search method to test each space-time location in the video by a binary classifier on whether a specific event occurs. The other system takes advantage of human detection and tracking to avoid the costly brute force search and evaluates the candidate space-time cubes by combining 3D convolutional neural networks (CNN) and SVM classifiers based on bag-ofwords local features to detect the presence of events of interests. Via thorough cross-validation on the development set, we select proper combining weights and thresholds to minimize the detection cost rates (DCR). Our systems achieve good performance on event categories which involve actions of a single person, e.g. CellToEar, ObjectPut, and Pointing.

Original languageEnglish (US)
StatePublished - Jan 1 2009
EventTREC Video Retrieval Evaluation, TRECVID 2009 - Gaithersburg, MD, United States
Duration: Nov 16 2009Nov 17 2009

Other

OtherTREC Video Retrieval Evaluation, TRECVID 2009
CountryUnited States
CityGaithersburg, MD
Period11/16/0911/17/09

Fingerprint

Classifiers
Neural networks
Costs

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Computer Vision and Pattern Recognition
  • Human-Computer Interaction
  • Software

Cite this

Yang, M., Ji, S., Xu, W., Wang, J., Lv, F., Yu, K., ... Huang, T. S. (2009). Detecting human actions in surveillance videos. Paper presented at TREC Video Retrieval Evaluation, TRECVID 2009, Gaithersburg, MD, United States.

Detecting human actions in surveillance videos. / Yang, Ming; Ji, Shuiwang; Xu, Wei; Wang, Jinjun; Lv, Fengjun; Yu, Kai; Gong, Yihong; Dikmen, Mert; Lin, Dennis J.; Huang, Thomas S.

2009. Paper presented at TREC Video Retrieval Evaluation, TRECVID 2009, Gaithersburg, MD, United States.

Research output: Contribution to conferencePaper

Yang, M, Ji, S, Xu, W, Wang, J, Lv, F, Yu, K, Gong, Y, Dikmen, M, Lin, DJ & Huang, TS 2009, 'Detecting human actions in surveillance videos', Paper presented at TREC Video Retrieval Evaluation, TRECVID 2009, Gaithersburg, MD, United States, 11/16/09 - 11/17/09.
Yang M, Ji S, Xu W, Wang J, Lv F, Yu K et al. Detecting human actions in surveillance videos. 2009. Paper presented at TREC Video Retrieval Evaluation, TRECVID 2009, Gaithersburg, MD, United States.
Yang, Ming ; Ji, Shuiwang ; Xu, Wei ; Wang, Jinjun ; Lv, Fengjun ; Yu, Kai ; Gong, Yihong ; Dikmen, Mert ; Lin, Dennis J. ; Huang, Thomas S. / Detecting human actions in surveillance videos. Paper presented at TREC Video Retrieval Evaluation, TRECVID 2009, Gaithersburg, MD, United States.
@conference{71048d1fa476414ab630e80200ddc1bf,
title = "Detecting human actions in surveillance videos",
abstract = "This notebook paper summarizes Team NEC-UIUC's approaches for TRECVid 2009 Evaluation of Surveillance Event Detection. Our submissions include two types of systems. One system employs the brute force search method to test each space-time location in the video by a binary classifier on whether a specific event occurs. The other system takes advantage of human detection and tracking to avoid the costly brute force search and evaluates the candidate space-time cubes by combining 3D convolutional neural networks (CNN) and SVM classifiers based on bag-ofwords local features to detect the presence of events of interests. Via thorough cross-validation on the development set, we select proper combining weights and thresholds to minimize the detection cost rates (DCR). Our systems achieve good performance on event categories which involve actions of a single person, e.g. CellToEar, ObjectPut, and Pointing.",
author = "Ming Yang and Shuiwang Ji and Wei Xu and Jinjun Wang and Fengjun Lv and Kai Yu and Yihong Gong and Mert Dikmen and Lin, {Dennis J.} and Huang, {Thomas S}",
year = "2009",
month = "1",
day = "1",
language = "English (US)",
note = "TREC Video Retrieval Evaluation, TRECVID 2009 ; Conference date: 16-11-2009 Through 17-11-2009",

}

TY - CONF

T1 - Detecting human actions in surveillance videos

AU - Yang, Ming

AU - Ji, Shuiwang

AU - Xu, Wei

AU - Wang, Jinjun

AU - Lv, Fengjun

AU - Yu, Kai

AU - Gong, Yihong

AU - Dikmen, Mert

AU - Lin, Dennis J.

AU - Huang, Thomas S

PY - 2009/1/1

Y1 - 2009/1/1

N2 - This notebook paper summarizes Team NEC-UIUC's approaches for TRECVid 2009 Evaluation of Surveillance Event Detection. Our submissions include two types of systems. One system employs the brute force search method to test each space-time location in the video by a binary classifier on whether a specific event occurs. The other system takes advantage of human detection and tracking to avoid the costly brute force search and evaluates the candidate space-time cubes by combining 3D convolutional neural networks (CNN) and SVM classifiers based on bag-ofwords local features to detect the presence of events of interests. Via thorough cross-validation on the development set, we select proper combining weights and thresholds to minimize the detection cost rates (DCR). Our systems achieve good performance on event categories which involve actions of a single person, e.g. CellToEar, ObjectPut, and Pointing.

AB - This notebook paper summarizes Team NEC-UIUC's approaches for TRECVid 2009 Evaluation of Surveillance Event Detection. Our submissions include two types of systems. One system employs the brute force search method to test each space-time location in the video by a binary classifier on whether a specific event occurs. The other system takes advantage of human detection and tracking to avoid the costly brute force search and evaluates the candidate space-time cubes by combining 3D convolutional neural networks (CNN) and SVM classifiers based on bag-ofwords local features to detect the presence of events of interests. Via thorough cross-validation on the development set, we select proper combining weights and thresholds to minimize the detection cost rates (DCR). Our systems achieve good performance on event categories which involve actions of a single person, e.g. CellToEar, ObjectPut, and Pointing.

UR - http://www.scopus.com/inward/record.url?scp=84905695098&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84905695098&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:84905695098

ER -