Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches

Yaping Cai, Kaiyu Guan, David Lobell, Andries B. Potgieter, Shaowen Wang, Jian Peng, Tianfang Xu, Senthold Asseng, Yongguang Zhang, Liangzhi You, Bin Peng

Research output: Contribution to journalArticle

Abstract

Wheat is the most important staple crop grown in Australia, and Australia is one of the top wheat exporting countries globally. Timely and reliable wheat yield prediction in Australia is important for regional and global food security. Prior studies use either climate data, or satellite data, or a combination of these two to build empirical models to predict crop yield. However, though the performance of yield prediction using empirical methods is improved by combining the use of climate and satellite data, the contributions from different data sources are still not clear. In addition, how the regression-based methods compare with various machine-learning based methods in their performance in yield prediction is also not well understood and needs in-depth investigation. This work integrated various sources of data to predict wheat yield across Australia from 2000 to 2014 at the statistical division (SD)level. We adopted a well-known regression method (LASSO, as a benchmark)and three mainstream machine learning methods (support vector machine, random forest, and neural network)to build various empirical models for yield prediction. For satellite data, we used the enhanced vegetation index (EVI)from MODIS and solar-induced chlorophyll fluorescence (SIF)from GOME-2 and SCIAMACHY as metrics to approximate crop productivity. The machine-learning based methods outperform the regression method in modeling crop yield. Our results confirm that combining climate and satellite data can achieve high performance of yield prediction at the SD level (R2 ˜ 0.75). The satellite data track crop growth condition and gradually capture the variability of yield evolving with the growing season, and their contributions to yield prediction usually saturate at the peak of the growing season. Climate data provide extra and unique information beyond what the satellite data have offered for yield prediction, and our empirical modeling work shows the added values of climate variables exist across the whole season, not only at some certain stages. We also find that using EVI as an input can achieve better performance in yield prediction than SIF, primarily due to the large noise in the satellite-based SIF data (i.e. coarse resolution in both space and time). In addition, we also explored the potential for timely wheat yield prediction in Australia, and we can achieve the optimal prediction performance with approximately two-month lead time before wheat maturity. The proposed methodology in this paper can be extended to different crops and different regions for crop yield prediction.

Original languageEnglish (US)
Pages (from-to)144-159
Number of pages16
JournalAgricultural and Forest Meteorology
Volume274
DOIs
StatePublished - Aug 15 2019

Fingerprint

artificial intelligence
wheat
climate
prediction
remote sensing
satellite data
crop yield
crop
chlorophyll
fluorescence
crops
vegetation index
methodology
machine learning
growing season
SCIAMACHY
GOME
moderate resolution imaging spectroradiometer
staples
value added

Keywords

  • Crop yield prediction
  • Enhanced vegetation index
  • Machine learning
  • Solar-induced fluorescence
  • Wheat

ASJC Scopus subject areas

  • Forestry
  • Global and Planetary Change
  • Agronomy and Crop Science
  • Atmospheric Science

Cite this

Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. / Cai, Yaping; Guan, Kaiyu; Lobell, David; Potgieter, Andries B.; Wang, Shaowen; Peng, Jian; Xu, Tianfang; Asseng, Senthold; Zhang, Yongguang; You, Liangzhi; Peng, Bin.

In: Agricultural and Forest Meteorology, Vol. 274, 15.08.2019, p. 144-159.

Research output: Contribution to journalArticle

Cai, Yaping ; Guan, Kaiyu ; Lobell, David ; Potgieter, Andries B. ; Wang, Shaowen ; Peng, Jian ; Xu, Tianfang ; Asseng, Senthold ; Zhang, Yongguang ; You, Liangzhi ; Peng, Bin. / Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. In: Agricultural and Forest Meteorology. 2019 ; Vol. 274. pp. 144-159.
@article{3a106e25157f458fbb40ccc270b5867a,
title = "Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches",
abstract = "Wheat is the most important staple crop grown in Australia, and Australia is one of the top wheat exporting countries globally. Timely and reliable wheat yield prediction in Australia is important for regional and global food security. Prior studies use either climate data, or satellite data, or a combination of these two to build empirical models to predict crop yield. However, though the performance of yield prediction using empirical methods is improved by combining the use of climate and satellite data, the contributions from different data sources are still not clear. In addition, how the regression-based methods compare with various machine-learning based methods in their performance in yield prediction is also not well understood and needs in-depth investigation. This work integrated various sources of data to predict wheat yield across Australia from 2000 to 2014 at the statistical division (SD)level. We adopted a well-known regression method (LASSO, as a benchmark)and three mainstream machine learning methods (support vector machine, random forest, and neural network)to build various empirical models for yield prediction. For satellite data, we used the enhanced vegetation index (EVI)from MODIS and solar-induced chlorophyll fluorescence (SIF)from GOME-2 and SCIAMACHY as metrics to approximate crop productivity. The machine-learning based methods outperform the regression method in modeling crop yield. Our results confirm that combining climate and satellite data can achieve high performance of yield prediction at the SD level (R2 ˜ 0.75). The satellite data track crop growth condition and gradually capture the variability of yield evolving with the growing season, and their contributions to yield prediction usually saturate at the peak of the growing season. Climate data provide extra and unique information beyond what the satellite data have offered for yield prediction, and our empirical modeling work shows the added values of climate variables exist across the whole season, not only at some certain stages. We also find that using EVI as an input can achieve better performance in yield prediction than SIF, primarily due to the large noise in the satellite-based SIF data (i.e. coarse resolution in both space and time). In addition, we also explored the potential for timely wheat yield prediction in Australia, and we can achieve the optimal prediction performance with approximately two-month lead time before wheat maturity. The proposed methodology in this paper can be extended to different crops and different regions for crop yield prediction.",
keywords = "Crop yield prediction, Enhanced vegetation index, Machine learning, Solar-induced fluorescence, Wheat",
author = "Yaping Cai and Kaiyu Guan and David Lobell and Potgieter, {Andries B.} and Shaowen Wang and Jian Peng and Tianfang Xu and Senthold Asseng and Yongguang Zhang and Liangzhi You and Bin Peng",
year = "2019",
month = "8",
day = "15",
doi = "10.1016/j.agrformet.2019.03.010",
language = "English (US)",
volume = "274",
pages = "144--159",
journal = "Agricultural and Forest Meteorology",
issn = "0168-1923",
publisher = "Elsevier",

}

TY - JOUR

T1 - Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches

AU - Cai, Yaping

AU - Guan, Kaiyu

AU - Lobell, David

AU - Potgieter, Andries B.

AU - Wang, Shaowen

AU - Peng, Jian

AU - Xu, Tianfang

AU - Asseng, Senthold

AU - Zhang, Yongguang

AU - You, Liangzhi

AU - Peng, Bin

PY - 2019/8/15

Y1 - 2019/8/15

N2 - Wheat is the most important staple crop grown in Australia, and Australia is one of the top wheat exporting countries globally. Timely and reliable wheat yield prediction in Australia is important for regional and global food security. Prior studies use either climate data, or satellite data, or a combination of these two to build empirical models to predict crop yield. However, though the performance of yield prediction using empirical methods is improved by combining the use of climate and satellite data, the contributions from different data sources are still not clear. In addition, how the regression-based methods compare with various machine-learning based methods in their performance in yield prediction is also not well understood and needs in-depth investigation. This work integrated various sources of data to predict wheat yield across Australia from 2000 to 2014 at the statistical division (SD)level. We adopted a well-known regression method (LASSO, as a benchmark)and three mainstream machine learning methods (support vector machine, random forest, and neural network)to build various empirical models for yield prediction. For satellite data, we used the enhanced vegetation index (EVI)from MODIS and solar-induced chlorophyll fluorescence (SIF)from GOME-2 and SCIAMACHY as metrics to approximate crop productivity. The machine-learning based methods outperform the regression method in modeling crop yield. Our results confirm that combining climate and satellite data can achieve high performance of yield prediction at the SD level (R2 ˜ 0.75). The satellite data track crop growth condition and gradually capture the variability of yield evolving with the growing season, and their contributions to yield prediction usually saturate at the peak of the growing season. Climate data provide extra and unique information beyond what the satellite data have offered for yield prediction, and our empirical modeling work shows the added values of climate variables exist across the whole season, not only at some certain stages. We also find that using EVI as an input can achieve better performance in yield prediction than SIF, primarily due to the large noise in the satellite-based SIF data (i.e. coarse resolution in both space and time). In addition, we also explored the potential for timely wheat yield prediction in Australia, and we can achieve the optimal prediction performance with approximately two-month lead time before wheat maturity. The proposed methodology in this paper can be extended to different crops and different regions for crop yield prediction.

AB - Wheat is the most important staple crop grown in Australia, and Australia is one of the top wheat exporting countries globally. Timely and reliable wheat yield prediction in Australia is important for regional and global food security. Prior studies use either climate data, or satellite data, or a combination of these two to build empirical models to predict crop yield. However, though the performance of yield prediction using empirical methods is improved by combining the use of climate and satellite data, the contributions from different data sources are still not clear. In addition, how the regression-based methods compare with various machine-learning based methods in their performance in yield prediction is also not well understood and needs in-depth investigation. This work integrated various sources of data to predict wheat yield across Australia from 2000 to 2014 at the statistical division (SD)level. We adopted a well-known regression method (LASSO, as a benchmark)and three mainstream machine learning methods (support vector machine, random forest, and neural network)to build various empirical models for yield prediction. For satellite data, we used the enhanced vegetation index (EVI)from MODIS and solar-induced chlorophyll fluorescence (SIF)from GOME-2 and SCIAMACHY as metrics to approximate crop productivity. The machine-learning based methods outperform the regression method in modeling crop yield. Our results confirm that combining climate and satellite data can achieve high performance of yield prediction at the SD level (R2 ˜ 0.75). The satellite data track crop growth condition and gradually capture the variability of yield evolving with the growing season, and their contributions to yield prediction usually saturate at the peak of the growing season. Climate data provide extra and unique information beyond what the satellite data have offered for yield prediction, and our empirical modeling work shows the added values of climate variables exist across the whole season, not only at some certain stages. We also find that using EVI as an input can achieve better performance in yield prediction than SIF, primarily due to the large noise in the satellite-based SIF data (i.e. coarse resolution in both space and time). In addition, we also explored the potential for timely wheat yield prediction in Australia, and we can achieve the optimal prediction performance with approximately two-month lead time before wheat maturity. The proposed methodology in this paper can be extended to different crops and different regions for crop yield prediction.

KW - Crop yield prediction

KW - Enhanced vegetation index

KW - Machine learning

KW - Solar-induced fluorescence

KW - Wheat

UR - http://www.scopus.com/inward/record.url?scp=85065431139&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85065431139&partnerID=8YFLogxK

U2 - 10.1016/j.agrformet.2019.03.010

DO - 10.1016/j.agrformet.2019.03.010

M3 - Article

VL - 274

SP - 144

EP - 159

JO - Agricultural and Forest Meteorology

JF - Agricultural and Forest Meteorology

SN - 0168-1923

ER -