Data mining to improve management and reduce costs of environmental remediation

Dara M. Farrell, Barbara S. Minsker, David Tcheng, Duanne Searsmith, Jane Bohn, Dennis Beckman

Research output: Contribution to journalArticlepeer-review


In this paper, data from 105 soil and groundwater remediation projects at BP gasoline service stations located in the state of Illinois were mined for lessons to reduce cost and improve management of remediation sites. Data mining software called D2K was used to train decision tree, stepwise linear regression and instance-based weighting models that relate hydrogeologic, sociopolitical, temporal and remedial factors in the site closure reports to remediation cost. The most important factors influencing cost were found to be the amount of soil excavated and the number of groundwater monitoring wells installed, suggesting that better management of excavation and well placement could result in significant cost savings. The best model for predicting cost classes (low, medium and high cost) was the decision tree, which had a prediction accuracy of approximately 73%. The misclassification of approximately 27% of the sites by even the best model suggests that remediation costs at service stations are influenced by other site-specific factors that may be difficult to accurately predict in advance.

Original languageEnglish (US)
Pages (from-to)107-121
Number of pages15
JournalJournal of Hydroinformatics
Issue number2
StatePublished - Apr 2007
Externally publishedYes


  • D2K
  • Data extraction
  • Data mining
  • Decision trees
  • Excavation
  • Remediation

ASJC Scopus subject areas

  • Civil and Structural Engineering
  • Water Science and Technology
  • Geotechnical Engineering and Engineering Geology
  • Atmospheric Science


Dive into the research topics of 'Data mining to improve management and reduce costs of environmental remediation'. Together they form a unique fingerprint.

Cite this