End-to-end vision-based detection, tracking and activity analysis of earthmoving equipment filmed at ground level

Dominic Roberts, Mani Golparvar-Fard

Research output: Contribution to journalArticlepeer-review


This paper presents a new benchmark dataset for validating vision-based methods that automatically identifies visually distinctive working activities of excavators and dump trucks from individual frames of a video sequence. Our dataset consists of 10 videos of interacting pairs of construction equipment filmed at ground level with accompanying ground truth annotations. These annotations consist of per-equipment and per-frame equipment bounding boxes that also have associated identities and activity labels. Our videos depict an excavator interacting with 1 or more dump trucks. We also propose a deep learning-based method for detecting and tracking objects based on Convolutional Neural Networks (CNNs). The tracking trajectories are fed into a Hidden Markov Model (HMM)that automatically discovers and assigns activity labels for any observed object. Our HMM method leverages trajectories to train a Gaussian Mixture Model (GMM)with which we estimate the probability density function of each activity using Support Vector Machine (SVM)classifiers. The proposed HMM also models activity duration and the transition between activities. We show that our method can accurately distinguish between individual equipment working activities. Results show 97.43% detection Average Precision (AP)for excavators and 75.29% AP for dump trucks, as well as cross-category tracking accuracy of 81.94% and tracking precision of 87.45%. Separate experiment results show activity analysis results of 86.8% accuracy for excavators and 88.5% for dump trucks. Our results show that our method can accurately conduct activity analysis and can be fused with methods that detect motion trajectories to scale to the needs of practical applications.

Original languageEnglish (US)
Article number102811
JournalAutomation in Construction
StatePublished - Sep 2019
Externally publishedYes


  • Activity analysis
  • Convolutional Neural Networks
  • Deep learning
  • Earthmoving operations
  • Hidden Markov Models

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Civil and Structural Engineering
  • Building and Construction


Dive into the research topics of 'End-to-end vision-based detection, tracking and activity analysis of earthmoving equipment filmed at ground level'. Together they form a unique fingerprint.

Cite this