TY - JOUR
T1 - A data-driven approach to develop physically sound predictors
T2 - Application to depth-averaged velocities on flows through submerged arrays of rigid cylinders
AU - Tinoco, R. O.
AU - Goldstein, E. B.
AU - Coco, G.
N1 - Publisher Copyright:
© 2015. American Geophysical Union. All Rights Reserved.
PY - 2015/2
Y1 - 2015/2
N2 - We use a machine learning approach to seek an accurate, physically sound predictor, to estimate the mean velocity for open-channel flow when submerged arrays of rigid cylinders (model vegetation) are present. A genetic programming routine is used to find a robust relationship between relevant properties of the model vegetation and flow parameters. We use published data from laboratory experiments covering a broad range of conditions to obtain an equation that matches the performance of other predictors from recent literature in terms of accuracy, while showing a less complex structure. We also investigate how different criteria for data selection, as well as the size of the data set used to train the algorithm, influences the accuracy of the resulting predictors. Our results show that a proper use of Machine-Learning techniques does not only provide empirical correlations, but can yield physically sound models as representative of the physical processes involved. We provide a clear, thorough example of the application of GP, its advantages and shortcomings, to encourage the use of data-driven techniques as part of the data analysis process, and to address common misconceptions of machine learning as simple correlation techniques or physically senseless statistical analysis.
AB - We use a machine learning approach to seek an accurate, physically sound predictor, to estimate the mean velocity for open-channel flow when submerged arrays of rigid cylinders (model vegetation) are present. A genetic programming routine is used to find a robust relationship between relevant properties of the model vegetation and flow parameters. We use published data from laboratory experiments covering a broad range of conditions to obtain an equation that matches the performance of other predictors from recent literature in terms of accuracy, while showing a less complex structure. We also investigate how different criteria for data selection, as well as the size of the data set used to train the algorithm, influences the accuracy of the resulting predictors. Our results show that a proper use of Machine-Learning techniques does not only provide empirical correlations, but can yield physically sound models as representative of the physical processes involved. We provide a clear, thorough example of the application of GP, its advantages and shortcomings, to encourage the use of data-driven techniques as part of the data analysis process, and to address common misconceptions of machine learning as simple correlation techniques or physically senseless statistical analysis.
KW - genetic programming
KW - machine learning
KW - open-channel flow
KW - vegetation resistance
UR - http://www.scopus.com/inward/record.url?scp=84924663862&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84924663862&partnerID=8YFLogxK
U2 - 10.1002/2014WR016380
DO - 10.1002/2014WR016380
M3 - Article
AN - SCOPUS:84924663862
SN - 0043-1397
VL - 51
SP - 1247
EP - 1263
JO - Water Resources Research
JF - Water Resources Research
IS - 2
ER -