TY - JOUR
T1 - Integrating data sources to improve hydraulic head predictions
T2 - A hierarchical machine learning approach
AU - Michael, William J.
AU - Minsker, Barbara S.
AU - Tcheng, David
AU - Valocchi, Albert J.
AU - Quinn, John J.
PY - 2005/3
Y1 - 2005/3
N2 - This study investigates how machine learning methods can be used to improve hydraulic head predictions by integrating different types of data, including data from numerical models, in a hierarchical approach. A suite of four machine learning methods (decision trees, instance-based weighting, inverse distance weighting, and neural networks) are tested in several hierarchical configurations with different types of data from the 317/319 area at Argonne National Laboratory-East. The best machine learning model had a mean predicted head error 50% smaller than an existing MODFLOW numerical flow model, and a standard deviation of predicted head error 67% lower than the MODFLOW model, computed across all sampled locations used for calibrating the MODFLOW model. These predictions were obtained using decision trees trained with all historical quarterly data; the hourly head measurements were not as useful for prediction, most likely because of their poor spatial coverage. The results show promise for using hierarchical machine learning approaches to improve predictions and to identify the most essential types of data to guide future sampling efforts. Decision trees were also combined with an existing MODFLOW model to test their capabilities for updating numerical models to improve predictions as new data are collected. The combined model had a mean error 50% lower than the MODFLOW model alone. These results demonstrate that hierarchical machine learning approaches can be used to improve predictive performance of existing numerical models in areas with good data coverage. Further research is needed to compare this approach with methods such as Kalman filtering.
AB - This study investigates how machine learning methods can be used to improve hydraulic head predictions by integrating different types of data, including data from numerical models, in a hierarchical approach. A suite of four machine learning methods (decision trees, instance-based weighting, inverse distance weighting, and neural networks) are tested in several hierarchical configurations with different types of data from the 317/319 area at Argonne National Laboratory-East. The best machine learning model had a mean predicted head error 50% smaller than an existing MODFLOW numerical flow model, and a standard deviation of predicted head error 67% lower than the MODFLOW model, computed across all sampled locations used for calibrating the MODFLOW model. These predictions were obtained using decision trees trained with all historical quarterly data; the hourly head measurements were not as useful for prediction, most likely because of their poor spatial coverage. The results show promise for using hierarchical machine learning approaches to improve predictions and to identify the most essential types of data to guide future sampling efforts. Decision trees were also combined with an existing MODFLOW model to test their capabilities for updating numerical models to improve predictions as new data are collected. The combined model had a mean error 50% lower than the MODFLOW model alone. These results demonstrate that hierarchical machine learning approaches can be used to improve predictive performance of existing numerical models in areas with good data coverage. Further research is needed to compare this approach with methods such as Kalman filtering.
UR - http://www.scopus.com/inward/record.url?scp=17844372249&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=17844372249&partnerID=8YFLogxK
U2 - 10.1029/2003WR002802
DO - 10.1029/2003WR002802
M3 - Article
AN - SCOPUS:17844372249
SN - 0043-1397
VL - 41
SP - 1
EP - 14
JO - Water Resources Research
JF - Water Resources Research
IS - 3
ER -