Using soil library hyperspectral reflectance and machine learning to predict soil organic carbon: Assessing potential of airborne and spaceborne optical soil sensing

Sheng Wang, Kaiyu Guan, Chenhui Zhang, Do Kyoung Lee, Andrew J. Margenot, Yufeng Ge, Jian Peng, Wang Zhou, Qu Zhou, Yizhi Huang

Research output: Contribution to journalArticlepeer-review


Soil organic carbon (SOC) is a key variable to determine soil functioning, ecosystem services, and global carbon cycles. Spectroscopy, particularly optical hyperspectral reflectance coupled with machine learning, can provide rapid, efficient, and cost-effective quantification of SOC. However, how to exploit soil hyperspectral reflectance to predict SOC concentration, and the potential performance of airborne and satellite data for predicting surface SOC at large scales remain relatively underknown. This study utilized a continental-scale soil laboratory spectral library (37,540 full-pedon 350–2500 nm reflectance spectra with SOC concentration of 0–780 g·kg−1 across the US) to thoroughly evaluate seven machine learning algorithms including Partial-Least Squares Regression (PLSR), Random Forest (RF), K-Nearest Neighbors (KNN), Ridge, Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) along with four preprocessed spectra, i.e. original, vector normalization, continuum removal, and first-order derivative, to quantify SOC concentration. Furthermore, by using the coupled soil-vegetation-atmosphere radiative transfer model, we simulated twelve airborne and spaceborne hyper/multi-spectral remote sensing data from surface bare soil laboratory spectra to evaluate their potential for estimating SOC concentration of surface bare soils. Results show that LSTM achieved best predictive performance of quantifying SOC concentration for the whole data sets (R2 = 0.96, RMSE = 30.81 g·kg−1), mineral soils (SOC ≤ 120 g·kg−1, R2 = 0.71, RMSE = 10.60 g·kg−1), and organic soils (SOC > 120 g·kg−1, R2 = 0.78, RMSE = 62.31 g·kg−1). Spectral data preprocessing, particularly the first-order derivative, improved the performance of PLSR, RF, Ridge, KNN, and ANN, but not LSTM or CNN. We found that the SOC models of mineral and organic soils should be distinguished given their distinct spectral signatures. Finally, we identified that the shortwave infrared is vital for airborne and spaceborne hyperspectral sensors to monitor surface SOC. This study highlights the high accuracy of LSTM with hyperspectral/multispectral data to mitigate a certain level of noise (soil moisture <0.4 m3·m−3, green leaf area < 0.3 m2·m−2, plant residue <0.4 m2·m−2) for quantifying surface SOC concentration. Forthcoming satellite hyperspectral missions like Surface Biology and Geology (SBG) have a high potential for future global soil carbon monitoring, while high-resolution satellite multispectral fusion data can be an alternative.

Original languageEnglish (US)
Article number112914
JournalRemote Sensing of Environment
StatePublished - Mar 15 2022


  • Hyperspectral reflectance
  • Long short-term memory
  • Machine learning
  • Radiative transfer modeling
  • SBG
  • Soil organic carbon
  • Spectroscopy

ASJC Scopus subject areas

  • Soil Science
  • Geology
  • Computers in Earth Sciences


Dive into the research topics of 'Using soil library hyperspectral reflectance and machine learning to predict soil organic carbon: Assessing potential of airborne and spaceborne optical soil sensing'. Together they form a unique fingerprint.

Cite this