Tuning support vector machines regression models improves prediction accuracy of soil properties in MIR spectroscopy

Leonardo Deiss, Andrew J. Margenot, Steve W. Culman, M. Scott Demyan

Research output: Contribution to journalArticle

Abstract

Estimating soil properties in diffuse reflectance infrared Fourier transform spectroscopy in the mid-infrared region (mid-DRIFTS) uses statistical modeling (chemometrics) to predict soil properties from spectra. Modeling approaches can have major impacts on prediction accuracy. However, the impact of selecting best parameters for an algorithm (tuning), to optimize non-linear models for predicting soil properties, is relatively unexplored in the domain of soil sciences. This study aimed to evaluate the predictive performance of linear (partial least squares, PLS) and non-linear (support vector machines, SVM) multivariate regression models in estimating soil physical, chemical, and biological properties with mid-DRIFTS. We evaluated the impact of optimizing two hyperparameters (epsilon and cost) based on the noise tolerance in the ε-insensitive loss function of SVM models using two contrasting and diverse sets of soils, one from northern Tanzania (n = 533) and another one from USA Midwest (n = 400). Regression models were trained on calibration sets (75%) and tested on independent validation sets (25%) separately for each dataset. Support vector machines outperformed PLS models for all tested soil properties (clay, sand, pH, total organic carbon, and permanganate oxidizable carbon) in both datasets. Tuning hyperparameters epsilon and cost maintained or improved prediction accuracy of SVM models based on root mean squared errors of independent validation sets. Support vector machines tuned hyperparameters differed among soil properties and also for the same soil property in distinct datasets, suggesting the need for parameterizing non-linear models for specific soil properties and datasets. Optimizing SVM regression models in mid-DRIFTS improves prediction accuracy of soil properties and therefore will likely enable obtaining more robust predictive outcomes even in datasets with diverse land uses, parent materials, and/or soil orders. We recommend that tuning should be included as a routine step when using SVM for estimating soil properties.

Original languageEnglish (US)
Article number114227
JournalGeoderma
Volume365
DOIs
StatePublished - Apr 15 2020

Keywords

  • Error-grid
  • FTIR
  • Kernel
  • Machine-learning
  • RMSE

ASJC Scopus subject areas

  • Soil Science

Fingerprint Dive into the research topics of 'Tuning support vector machines regression models improves prediction accuracy of soil properties in MIR spectroscopy'. Together they form a unique fingerprint.

  • Cite this