Toward building a transparent statistical model for improving crop yield prediction

Modeling rainfed corn in the U.S

Yan Li, Kaiyu Guan, Albert Yu, Bin Peng, Lei Zhao, Bo Li, Jian Peng

Research output: Contribution to journalArticle

Abstract

Statistical crop models have been a major tool in identifying critical drivers of crop yield, forecasting short-term crop yield, and assessing long-term climate change impacts on agricultural productivity. However, few studies focus specifically on fundamental issues encountered in developing a high-performance statistical crop model for yield prediction. Such issues include: how to select predictors and fitting functions, how to effectively address the spatiotemporal scale issue, weather it is beneficial to include satellite data as explanatory variables, and how to reconcile different model evaluation procedures. In this study, we present our statistical modeling practices for predicting rainfed corn yield in the Midwest U.S. and address the aforementioned issues through comprehensive diagnostic analysis. Our results show that vapor pressure deficit and precipitation at a monthly scale, in spline form with customized knots, define the “Best Climate-only” model among alternative climate variables (e.g., air temperature) and fitting functions (e.g., linear or polynomial), with an out-of-sample (leave-one-year-out) median R 2 of 0.79 and RMSE of 1.04 t/ha (16.6 bu/acre) from 2003 to 2016. Satellite variables, such as MODIS land surface temperature and Enhanced Vegetation Index (EVI), when used as predictors alone, reduce the model's RMSE to 0.93 t/ha (14.8 bu/acre). Adding satellite variables (i.e., EVI in polynomial form) to the “Best Climate-only” model gives the “Best Climate + EVI” model, which has the highest prediction performance of this study, with a median R 2 of 0.85 and RMSE of 0.90 t/ha (14.3 bu/acre). Such a model trained using all data (so-called “global model”) in most cases leads to better predictions than the state-specific trained models. However, the global model's prediction performance exhibits considerable regional and interannual variations. The regional-varying performance is related to states’ spatiotemporal variability in yield, where states with larger spatial yield variability show higher R 2 , and states with smaller temporal yield variability show lower RMSE. Interannual variations in prediction performance are linked to yield variability and degree of wetness, with higher R 2 in years with larger yield variability but increasingly larger RMSE toward wetter years and extreme dry years. These identified spatial and temporal variations of model's performance, together with inconsistent evaluation practices undermine the comparability between statistical modeling studies. Alleviating such comparability issues requires more transparency and open data practices. The statistical model presented in this study provides a benchmark for further development and can be applied to future research related to yield prediction or assessment of climate change impact.

Original languageEnglish (US)
Pages (from-to)55-65
Number of pages11
JournalField Crops Research
Volume234
DOIs
StatePublished - Mar 15 2019

Fingerprint

statistical models
crop yield
maize
prediction
corn
modeling
vegetation index
climate models
crop models
annual variation
climate modeling
climate change
crop
climate
moderate resolution imaging spectroradiometer
knots
vapor pressure
transparency
MODIS
satellite data

Keywords

  • Agriculture
  • Corn
  • Statistical model
  • Yield forecast
  • Yield prediction

ASJC Scopus subject areas

  • Agronomy and Crop Science
  • Soil Science

Cite this

Toward building a transparent statistical model for improving crop yield prediction : Modeling rainfed corn in the U.S. / Li, Yan; Guan, Kaiyu; Yu, Albert; Peng, Bin; Zhao, Lei; Li, Bo; Peng, Jian.

In: Field Crops Research, Vol. 234, 15.03.2019, p. 55-65.

Research output: Contribution to journalArticle

@article{caf3938055c947c1a3248d48e7405c26,
title = "Toward building a transparent statistical model for improving crop yield prediction: Modeling rainfed corn in the U.S",
abstract = "Statistical crop models have been a major tool in identifying critical drivers of crop yield, forecasting short-term crop yield, and assessing long-term climate change impacts on agricultural productivity. However, few studies focus specifically on fundamental issues encountered in developing a high-performance statistical crop model for yield prediction. Such issues include: how to select predictors and fitting functions, how to effectively address the spatiotemporal scale issue, weather it is beneficial to include satellite data as explanatory variables, and how to reconcile different model evaluation procedures. In this study, we present our statistical modeling practices for predicting rainfed corn yield in the Midwest U.S. and address the aforementioned issues through comprehensive diagnostic analysis. Our results show that vapor pressure deficit and precipitation at a monthly scale, in spline form with customized knots, define the “Best Climate-only” model among alternative climate variables (e.g., air temperature) and fitting functions (e.g., linear or polynomial), with an out-of-sample (leave-one-year-out) median R 2 of 0.79 and RMSE of 1.04 t/ha (16.6 bu/acre) from 2003 to 2016. Satellite variables, such as MODIS land surface temperature and Enhanced Vegetation Index (EVI), when used as predictors alone, reduce the model's RMSE to 0.93 t/ha (14.8 bu/acre). Adding satellite variables (i.e., EVI in polynomial form) to the “Best Climate-only” model gives the “Best Climate + EVI” model, which has the highest prediction performance of this study, with a median R 2 of 0.85 and RMSE of 0.90 t/ha (14.3 bu/acre). Such a model trained using all data (so-called “global model”) in most cases leads to better predictions than the state-specific trained models. However, the global model's prediction performance exhibits considerable regional and interannual variations. The regional-varying performance is related to states’ spatiotemporal variability in yield, where states with larger spatial yield variability show higher R 2 , and states with smaller temporal yield variability show lower RMSE. Interannual variations in prediction performance are linked to yield variability and degree of wetness, with higher R 2 in years with larger yield variability but increasingly larger RMSE toward wetter years and extreme dry years. These identified spatial and temporal variations of model's performance, together with inconsistent evaluation practices undermine the comparability between statistical modeling studies. Alleviating such comparability issues requires more transparency and open data practices. The statistical model presented in this study provides a benchmark for further development and can be applied to future research related to yield prediction or assessment of climate change impact.",
keywords = "Agriculture, Corn, Statistical model, Yield forecast, Yield prediction",
author = "Yan Li and Kaiyu Guan and Albert Yu and Bin Peng and Lei Zhao and Bo Li and Jian Peng",
year = "2019",
month = "3",
day = "15",
doi = "10.1016/j.fcr.2019.02.005",
language = "English (US)",
volume = "234",
pages = "55--65",
journal = "Field Crops Research",
issn = "0378-4290",
publisher = "Elsevier",

}

TY - JOUR

T1 - Toward building a transparent statistical model for improving crop yield prediction

T2 - Modeling rainfed corn in the U.S

AU - Li, Yan

AU - Guan, Kaiyu

AU - Yu, Albert

AU - Peng, Bin

AU - Zhao, Lei

AU - Li, Bo

AU - Peng, Jian

PY - 2019/3/15

Y1 - 2019/3/15

N2 - Statistical crop models have been a major tool in identifying critical drivers of crop yield, forecasting short-term crop yield, and assessing long-term climate change impacts on agricultural productivity. However, few studies focus specifically on fundamental issues encountered in developing a high-performance statistical crop model for yield prediction. Such issues include: how to select predictors and fitting functions, how to effectively address the spatiotemporal scale issue, weather it is beneficial to include satellite data as explanatory variables, and how to reconcile different model evaluation procedures. In this study, we present our statistical modeling practices for predicting rainfed corn yield in the Midwest U.S. and address the aforementioned issues through comprehensive diagnostic analysis. Our results show that vapor pressure deficit and precipitation at a monthly scale, in spline form with customized knots, define the “Best Climate-only” model among alternative climate variables (e.g., air temperature) and fitting functions (e.g., linear or polynomial), with an out-of-sample (leave-one-year-out) median R 2 of 0.79 and RMSE of 1.04 t/ha (16.6 bu/acre) from 2003 to 2016. Satellite variables, such as MODIS land surface temperature and Enhanced Vegetation Index (EVI), when used as predictors alone, reduce the model's RMSE to 0.93 t/ha (14.8 bu/acre). Adding satellite variables (i.e., EVI in polynomial form) to the “Best Climate-only” model gives the “Best Climate + EVI” model, which has the highest prediction performance of this study, with a median R 2 of 0.85 and RMSE of 0.90 t/ha (14.3 bu/acre). Such a model trained using all data (so-called “global model”) in most cases leads to better predictions than the state-specific trained models. However, the global model's prediction performance exhibits considerable regional and interannual variations. The regional-varying performance is related to states’ spatiotemporal variability in yield, where states with larger spatial yield variability show higher R 2 , and states with smaller temporal yield variability show lower RMSE. Interannual variations in prediction performance are linked to yield variability and degree of wetness, with higher R 2 in years with larger yield variability but increasingly larger RMSE toward wetter years and extreme dry years. These identified spatial and temporal variations of model's performance, together with inconsistent evaluation practices undermine the comparability between statistical modeling studies. Alleviating such comparability issues requires more transparency and open data practices. The statistical model presented in this study provides a benchmark for further development and can be applied to future research related to yield prediction or assessment of climate change impact.

AB - Statistical crop models have been a major tool in identifying critical drivers of crop yield, forecasting short-term crop yield, and assessing long-term climate change impacts on agricultural productivity. However, few studies focus specifically on fundamental issues encountered in developing a high-performance statistical crop model for yield prediction. Such issues include: how to select predictors and fitting functions, how to effectively address the spatiotemporal scale issue, weather it is beneficial to include satellite data as explanatory variables, and how to reconcile different model evaluation procedures. In this study, we present our statistical modeling practices for predicting rainfed corn yield in the Midwest U.S. and address the aforementioned issues through comprehensive diagnostic analysis. Our results show that vapor pressure deficit and precipitation at a monthly scale, in spline form with customized knots, define the “Best Climate-only” model among alternative climate variables (e.g., air temperature) and fitting functions (e.g., linear or polynomial), with an out-of-sample (leave-one-year-out) median R 2 of 0.79 and RMSE of 1.04 t/ha (16.6 bu/acre) from 2003 to 2016. Satellite variables, such as MODIS land surface temperature and Enhanced Vegetation Index (EVI), when used as predictors alone, reduce the model's RMSE to 0.93 t/ha (14.8 bu/acre). Adding satellite variables (i.e., EVI in polynomial form) to the “Best Climate-only” model gives the “Best Climate + EVI” model, which has the highest prediction performance of this study, with a median R 2 of 0.85 and RMSE of 0.90 t/ha (14.3 bu/acre). Such a model trained using all data (so-called “global model”) in most cases leads to better predictions than the state-specific trained models. However, the global model's prediction performance exhibits considerable regional and interannual variations. The regional-varying performance is related to states’ spatiotemporal variability in yield, where states with larger spatial yield variability show higher R 2 , and states with smaller temporal yield variability show lower RMSE. Interannual variations in prediction performance are linked to yield variability and degree of wetness, with higher R 2 in years with larger yield variability but increasingly larger RMSE toward wetter years and extreme dry years. These identified spatial and temporal variations of model's performance, together with inconsistent evaluation practices undermine the comparability between statistical modeling studies. Alleviating such comparability issues requires more transparency and open data practices. The statistical model presented in this study provides a benchmark for further development and can be applied to future research related to yield prediction or assessment of climate change impact.

KW - Agriculture

KW - Corn

KW - Statistical model

KW - Yield forecast

KW - Yield prediction

UR - http://www.scopus.com/inward/record.url?scp=85061643850&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85061643850&partnerID=8YFLogxK

U2 - 10.1016/j.fcr.2019.02.005

DO - 10.1016/j.fcr.2019.02.005

M3 - Article

VL - 234

SP - 55

EP - 65

JO - Field Crops Research

JF - Field Crops Research

SN - 0378-4290

ER -