TY - JOUR
T1 - Prediction of gully erosion susceptibility through the lens of the SHapley Additive exPlanations (SHAP) method using a stacking ensemble model
AU - Han, Jeongho
AU - Guzman, Jorge A.
AU - Chu, Maria L.
N1 - This research was funded by the US Department of Agriculture through the National Institute for Food and Agriculture (NIFA) award number 2019-67019-29884. NCAR/EOL provided Radar-derived QPE data (NCEP Stage IV) under the sponsorship of the National Science Foundation ( https://data.eol.ucar.edu/ ). LiDAR data was provided by the Illinois Geospatial Data Clearinghouse ( https://clearinghouse.igsc.illinois.edu/frontpage ), supported by the Prairie Research Institute , University of Illinois at Urbana-Champaign.
PY - 2025/5
Y1 - 2025/5
N2 - This study develops a novel explainable stacking ensemble model that combines the stacked generalization ensemble method with SHapley Additive exPlanations (SHAP) to enhance the prediction and interpretation of gully erosion susceptibility. Applied to Jefferson County, Illinois, our approach leverages Random Forest (RF), Gradient Boosting Machine (GBM), Logistic Regression (LR), and Deep Neural Networks (DNN) as both base and meta-learners in various configurations, resulting in 44 distinct stacking models. The comparative analysis demonstrated the superior predictive performance of the stacked models when evaluated at 200 randomly gully sites selected points based on LiDAR difference observations; all but three exceeded the highest area under the curve (AUC) value of 0.86 achieved by the best-performing base model (GBM). The LR stacking model, combining RF and GBM as base models with LR as the meta-learner, emerged as the most effective, achieving an AUC of 0.916. The resulting gully erosion susceptibility map by the LR stacking model classified 33 % of the agricultural land (89,208 ha) as the “very high” class, compared to 27 %, 87 %, 27 %, and 55 % predicted by individual RF, LR, GBM, and DNN models, respectively. Crucially, SHAP analysis elucidated how changes in feature values influence model behavior, considering feature interactions within both the base models and the meta-learner. The SHAP identified the annual leaf area index (LAI) as the most influential feature in both RF and GBM base models. Additionally, it highlights the significance of the GBM model in comparison to the RF base model in the final decision-making process of the stacking model. By offering a transparent mechanism to evaluate how different features and models contribute to final decisions, this approach can be extended to broader environmental management and policy-making contexts, facilitating more informed and responsible resource allocation.
AB - This study develops a novel explainable stacking ensemble model that combines the stacked generalization ensemble method with SHapley Additive exPlanations (SHAP) to enhance the prediction and interpretation of gully erosion susceptibility. Applied to Jefferson County, Illinois, our approach leverages Random Forest (RF), Gradient Boosting Machine (GBM), Logistic Regression (LR), and Deep Neural Networks (DNN) as both base and meta-learners in various configurations, resulting in 44 distinct stacking models. The comparative analysis demonstrated the superior predictive performance of the stacked models when evaluated at 200 randomly gully sites selected points based on LiDAR difference observations; all but three exceeded the highest area under the curve (AUC) value of 0.86 achieved by the best-performing base model (GBM). The LR stacking model, combining RF and GBM as base models with LR as the meta-learner, emerged as the most effective, achieving an AUC of 0.916. The resulting gully erosion susceptibility map by the LR stacking model classified 33 % of the agricultural land (89,208 ha) as the “very high” class, compared to 27 %, 87 %, 27 %, and 55 % predicted by individual RF, LR, GBM, and DNN models, respectively. Crucially, SHAP analysis elucidated how changes in feature values influence model behavior, considering feature interactions within both the base models and the meta-learner. The SHAP identified the annual leaf area index (LAI) as the most influential feature in both RF and GBM base models. Additionally, it highlights the significance of the GBM model in comparison to the RF base model in the final decision-making process of the stacking model. By offering a transparent mechanism to evaluate how different features and models contribute to final decisions, this approach can be extended to broader environmental management and policy-making contexts, facilitating more informed and responsible resource allocation.
KW - Explainable artificial intelligence
KW - Gully erosion susceptibility
KW - Machine learning
KW - Shapley additive explanations
KW - Stacked generalization
UR - https://www.scopus.com/pages/publications/105003232550
UR - https://www.scopus.com/pages/publications/105003232550#tab=citedBy
U2 - 10.1016/j.jenvman.2025.125478
DO - 10.1016/j.jenvman.2025.125478
M3 - Article
C2 - 40286423
AN - SCOPUS:105003232550
SN - 0301-4797
VL - 383
JO - Journal of Environmental Management
JF - Journal of Environmental Management
M1 - 125478
ER -