End-to-end vision-based detection, tracking and activity analysis of earthmoving equipment filmed at ground level

Dominic Roberts, Mani Golparvar Fard

Research output: Contribution to journalArticle

Abstract

This paper presents a new benchmark dataset for validating vision-based methods that automatically identifies visually distinctive working activities of excavators and dump trucks from individual frames of a video sequence. Our dataset consists of 10 videos of interacting pairs of construction equipment filmed at ground level with accompanying ground truth annotations. These annotations consist of per-equipment and per-frame equipment bounding boxes that also have associated identities and activity labels. Our videos depict an excavator interacting with 1 or more dump trucks. We also propose a deep learning-based method for detecting and tracking objects based on Convolutional Neural Networks (CNNs). The tracking trajectories are fed into a Hidden Markov Model (HMM)that automatically discovers and assigns activity labels for any observed object. Our HMM method leverages trajectories to train a Gaussian Mixture Model (GMM)with which we estimate the probability density function of each activity using Support Vector Machine (SVM)classifiers. The proposed HMM also models activity duration and the transition between activities. We show that our method can accurately distinguish between individual equipment working activities. Results show 97.43% detection Average Precision (AP)for excavators and 75.29% AP for dump trucks, as well as cross-category tracking accuracy of 81.94% and tracking precision of 87.45%. Separate experiment results show activity analysis results of 86.8% accuracy for excavators and 88.5% for dump trucks. Our results show that our method can accurately conduct activity analysis and can be fused with methods that detect motion trajectories to scale to the needs of practical applications.

Original languageEnglish (US)
Article number102811
JournalAutomation in Construction
Volume105
DOIs
StatePublished - Sep 2019

Fingerprint

Excavators
Trucks
Hidden Markov models
Trajectories
Labels
Construction equipment
Probability density function
Support vector machines
Classifiers
Neural networks
Experiments

Keywords

  • Activity analysis
  • Convolutional Neural Networks
  • Deep learning
  • Earthmoving operations
  • Hidden Markov Models

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Civil and Structural Engineering
  • Building and Construction

Cite this

@article{e255a8439cd34e12b744f4debbeea339,
title = "End-to-end vision-based detection, tracking and activity analysis of earthmoving equipment filmed at ground level",
abstract = "This paper presents a new benchmark dataset for validating vision-based methods that automatically identifies visually distinctive working activities of excavators and dump trucks from individual frames of a video sequence. Our dataset consists of 10 videos of interacting pairs of construction equipment filmed at ground level with accompanying ground truth annotations. These annotations consist of per-equipment and per-frame equipment bounding boxes that also have associated identities and activity labels. Our videos depict an excavator interacting with 1 or more dump trucks. We also propose a deep learning-based method for detecting and tracking objects based on Convolutional Neural Networks (CNNs). The tracking trajectories are fed into a Hidden Markov Model (HMM)that automatically discovers and assigns activity labels for any observed object. Our HMM method leverages trajectories to train a Gaussian Mixture Model (GMM)with which we estimate the probability density function of each activity using Support Vector Machine (SVM)classifiers. The proposed HMM also models activity duration and the transition between activities. We show that our method can accurately distinguish between individual equipment working activities. Results show 97.43{\%} detection Average Precision (AP)for excavators and 75.29{\%} AP for dump trucks, as well as cross-category tracking accuracy of 81.94{\%} and tracking precision of 87.45{\%}. Separate experiment results show activity analysis results of 86.8{\%} accuracy for excavators and 88.5{\%} for dump trucks. Our results show that our method can accurately conduct activity analysis and can be fused with methods that detect motion trajectories to scale to the needs of practical applications.",
keywords = "Activity analysis, Convolutional Neural Networks, Deep learning, Earthmoving operations, Hidden Markov Models",
author = "Dominic Roberts and {Golparvar Fard}, Mani",
year = "2019",
month = "9",
doi = "10.1016/j.autcon.2019.04.006",
language = "English (US)",
volume = "105",
journal = "Automation in Construction",
issn = "0926-5805",
publisher = "Elsevier",

}

TY - JOUR

T1 - End-to-end vision-based detection, tracking and activity analysis of earthmoving equipment filmed at ground level

AU - Roberts, Dominic

AU - Golparvar Fard, Mani

PY - 2019/9

Y1 - 2019/9

N2 - This paper presents a new benchmark dataset for validating vision-based methods that automatically identifies visually distinctive working activities of excavators and dump trucks from individual frames of a video sequence. Our dataset consists of 10 videos of interacting pairs of construction equipment filmed at ground level with accompanying ground truth annotations. These annotations consist of per-equipment and per-frame equipment bounding boxes that also have associated identities and activity labels. Our videos depict an excavator interacting with 1 or more dump trucks. We also propose a deep learning-based method for detecting and tracking objects based on Convolutional Neural Networks (CNNs). The tracking trajectories are fed into a Hidden Markov Model (HMM)that automatically discovers and assigns activity labels for any observed object. Our HMM method leverages trajectories to train a Gaussian Mixture Model (GMM)with which we estimate the probability density function of each activity using Support Vector Machine (SVM)classifiers. The proposed HMM also models activity duration and the transition between activities. We show that our method can accurately distinguish between individual equipment working activities. Results show 97.43% detection Average Precision (AP)for excavators and 75.29% AP for dump trucks, as well as cross-category tracking accuracy of 81.94% and tracking precision of 87.45%. Separate experiment results show activity analysis results of 86.8% accuracy for excavators and 88.5% for dump trucks. Our results show that our method can accurately conduct activity analysis and can be fused with methods that detect motion trajectories to scale to the needs of practical applications.

AB - This paper presents a new benchmark dataset for validating vision-based methods that automatically identifies visually distinctive working activities of excavators and dump trucks from individual frames of a video sequence. Our dataset consists of 10 videos of interacting pairs of construction equipment filmed at ground level with accompanying ground truth annotations. These annotations consist of per-equipment and per-frame equipment bounding boxes that also have associated identities and activity labels. Our videos depict an excavator interacting with 1 or more dump trucks. We also propose a deep learning-based method for detecting and tracking objects based on Convolutional Neural Networks (CNNs). The tracking trajectories are fed into a Hidden Markov Model (HMM)that automatically discovers and assigns activity labels for any observed object. Our HMM method leverages trajectories to train a Gaussian Mixture Model (GMM)with which we estimate the probability density function of each activity using Support Vector Machine (SVM)classifiers. The proposed HMM also models activity duration and the transition between activities. We show that our method can accurately distinguish between individual equipment working activities. Results show 97.43% detection Average Precision (AP)for excavators and 75.29% AP for dump trucks, as well as cross-category tracking accuracy of 81.94% and tracking precision of 87.45%. Separate experiment results show activity analysis results of 86.8% accuracy for excavators and 88.5% for dump trucks. Our results show that our method can accurately conduct activity analysis and can be fused with methods that detect motion trajectories to scale to the needs of practical applications.

KW - Activity analysis

KW - Convolutional Neural Networks

KW - Deep learning

KW - Earthmoving operations

KW - Hidden Markov Models

UR - http://www.scopus.com/inward/record.url?scp=85065825942&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065825942&partnerID=8YFLogxK

U2 - 10.1016/j.autcon.2019.04.006

DO - 10.1016/j.autcon.2019.04.006

M3 - Article

AN - SCOPUS:85065825942

VL - 105

JO - Automation in Construction

JF - Automation in Construction

SN - 0926-5805

M1 - 102811

ER -