TY - JOUR
T1 - Vision-based action recognition of earthmoving equipment using spatio-temporal features and support vector machine classifiers
AU - Golparvar-Fard, Mani
AU - Heydarian, Arsalan
AU - Niebles, Juan Carlos
N1 - Funding Information:
The authors would like to thank the Virginia Tech Department of Planning, Design and Construction, as well as Holder and Skanska construction companies for providing access to their jobsites for a comprehensive data collection. The support of RAAMAC lab’s current and former members, Chris Bowling and David Cline, Hooman Rouhi, Hesham Barazi, Daniel Vaca, Marty Johnson, Nour Dabboussi, and Moshe Zelkowicz is also appreciated. The work is supported by grant from Institute of Critical Technologies and Applied Science at Virginia Tech. The work is also partly supported by “el Patrimonio Autonomo Fondo Nacional de Financiamiento para la Ciencia, la Tecnologia y la Innovacion, Francisco Jose De Caldas” under Contract RC No. 0394-2012 with Universidad del Norte.
Copyright:
Copyright 2013 Elsevier B.V., All rights reserved.
PY - 2013/10
Y1 - 2013/10
N2 - Video recordings of earthmoving construction operations provide understandable data that can be used for benchmarking and analyzing their performance. These recordings further support project managers to take corrective actions on performance deviations and in turn improve operational efficiency. Despite these benefits, manual stopwatch studies of previously recorded videos can be labor-intensive, may suffer from biases of the observers, and are impractical after substantial period of observations. This paper presents a new computer vision based algorithm for recognizing single actions of earthmoving construction equipment. This is particularly a challenging task as equipment can be partially occluded in site video streams and usually come in wide variety of sizes and appearances. The scale and pose of the equipment actions can also significantly vary based on the camera configurations. In the proposed method, a video is initially represented as a collection of spatio-temporal visual features by extracting space-time interest points and describing each feature with a Histogram of Oriented Gradients (HOG). The algorithm automatically learns the distributions of the spatio-temporal features and action categories using a multi-class Support Vector Machine (SVM) classifier. This strategy handles noisy feature points arisen from typical dynamic backgrounds. Given a video sequence captured from a fixed camera, the multi-class SVM classifier recognizes and localizes equipment actions. For the purpose of evaluation, a new video dataset is introduced which contains 859 sequences from excavator and truck actions. This dataset contains large variations of equipment pose and scale, and has varied backgrounds and levels of occlusion. The experimental results with average accuracies of 86.33% and 98.33% show that our supervised method outperforms previous algorithms for excavator and truck action recognition. The results hold the promise for applicability of the proposed method for construction activity analysis.
AB - Video recordings of earthmoving construction operations provide understandable data that can be used for benchmarking and analyzing their performance. These recordings further support project managers to take corrective actions on performance deviations and in turn improve operational efficiency. Despite these benefits, manual stopwatch studies of previously recorded videos can be labor-intensive, may suffer from biases of the observers, and are impractical after substantial period of observations. This paper presents a new computer vision based algorithm for recognizing single actions of earthmoving construction equipment. This is particularly a challenging task as equipment can be partially occluded in site video streams and usually come in wide variety of sizes and appearances. The scale and pose of the equipment actions can also significantly vary based on the camera configurations. In the proposed method, a video is initially represented as a collection of spatio-temporal visual features by extracting space-time interest points and describing each feature with a Histogram of Oriented Gradients (HOG). The algorithm automatically learns the distributions of the spatio-temporal features and action categories using a multi-class Support Vector Machine (SVM) classifier. This strategy handles noisy feature points arisen from typical dynamic backgrounds. Given a video sequence captured from a fixed camera, the multi-class SVM classifier recognizes and localizes equipment actions. For the purpose of evaluation, a new video dataset is introduced which contains 859 sequences from excavator and truck actions. This dataset contains large variations of equipment pose and scale, and has varied backgrounds and levels of occlusion. The experimental results with average accuracies of 86.33% and 98.33% show that our supervised method outperforms previous algorithms for excavator and truck action recognition. The results hold the promise for applicability of the proposed method for construction activity analysis.
KW - Action recognition
KW - Activity analysis
KW - Computer vision
KW - Construction productivity
KW - Operational efficiency
KW - Time-studies
UR - http://www.scopus.com/inward/record.url?scp=84888007785&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84888007785&partnerID=8YFLogxK
U2 - 10.1016/j.aei.2013.09.001
DO - 10.1016/j.aei.2013.09.001
M3 - Article
AN - SCOPUS:84888007785
SN - 1474-0346
VL - 27
SP - 652
EP - 663
JO - Advanced Engineering Informatics
JF - Advanced Engineering Informatics
IS - 4
ER -