Machine learning and deep learning for mineralogy interpretation and CO2 saturation estimation in geological carbon Storage: A case study in the Illinois Basin

Hongsheng Wang, Sherilyn Williams-Stroud, Dustin Crandall, Cheng Chen

Research output: Contribution to journalArticlepeer-review


Carbon capture and storage (CCS) is a promising approach to simultaneously maintaining energy security and reducing carbon dioxide (CO2) emissions under the current energy portfolio that is dominated by fossil fuel energy. Pre-injection formation characterization and post-injection CO2 monitoring are two critical tasks to guarantee storage efficiency in CCS. The CCS projects in the Illinois Basin, the first large-scale CO2 injection into saline aquifers in the United States, employed conventional and the latest pulsed neutron logging (PNL) tools for mineralogy interpretation and CO2 saturation estimation, which provide valuable references for future CCS projects. Because of the inherent fuzziness of petrophysical measurements and complex subsurface heterogeneity, interpreting well-logging data is time-consuming, and its accuracy can be user-biased. In recent years, data-driven methods have been widely used to capture the non-linear patterns between input features and interpretation results. This work applied and evaluated four commonly used machine learning (ML) models, including ridge regression (RR), random forest (RF), gradient boosting regression (GBR), support vector regression (SVR), and one deep learning (DL) model, the artificial neural network (ANN). We optimized the hyperparameters of the four ML models and the DL model using the simulated annealing algorithm and the grid search strategy, respectively. The input features of the mineralogy interpretation models were eleven conventional well-logging parameters, and the label data (i.e., ground truth) were the porosity and volumetric fractions of six minerals, including quartz, feldspar, dolomite, calcite, clay, and iron minerals. The results demonstrated that the GBR and RF models were superior in predicting volumetric fractions of minerals and porosity; label data with low coefficient of variation (CV) values tended to yield better performance. For CO2 saturation estimation, the RF was the best-performing model, followed by SVR, ANN, GBR, and RR. Furthermore, we conducted feature importance ranking using the permutation importance algorithm and found that the formation sigma and well pressure were the most important features in this study. The study of CCS projects in the Illinois Basin bridges the gap between the limited knowledge and understanding of geological carbon storage and the increasing demand for reliable, cost-effective, and sustainable energy solutions.

Original languageEnglish (US)
Article number130586
StatePublished - Apr 1 2024
Externally publishedYes


  • Conventional well-logging tool
  • Geological carbon storage
  • Illinois Basin
  • Pulsed neutron logging tool
  • Supervised machine learning

ASJC Scopus subject areas

  • General Chemical Engineering
  • Fuel Technology
  • Energy Engineering and Power Technology
  • Organic Chemistry


Dive into the research topics of 'Machine learning and deep learning for mineralogy interpretation and CO2 saturation estimation in geological carbon Storage: A case study in the Illinois Basin'. Together they form a unique fingerprint.

Cite this