Hyperspectral leaf reflectance as proxy for photosynthetic capacities: An ensemble approach based on multiple machine learning algorithms

Peng Fu, Katherine Meacham-Hensold, Kaiyu Guan, Carl Bernacchi

Research output: Contribution to journalArticle

Abstract

Global agriculture production is challenged by increasing demands from rising population and a changing climate, which may be alleviated through development of genetically improved crop cultivars. Research into increasing photosynthetic energy conversion efficiency has proposed many strategies to improve production but have yet to yield real-world solutions, largely because of a phenotyping bottleneck. Partial least squares regression (PLSR) is a statistical technique that is increasingly used to relate hyperspectral reflectance to key photosynthetic capacities associated with carbon uptake (maximum carboxylation rate of Rubisco, Vc,max) and conversion of light energy (maximum electron transport rate supporting RuBP regeneration, Jmax) to alleviate this bottleneck. However, its performance varies significantly across different plant species, regions, and growth environments. Thus, to cope with the heterogeneous performances of PLSR, this study aims to develop a new approach to estimate photosynthetic capacities. A framework was developed that combines six machine learning algorithms, including artificial neural network (ANN), support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), random forest (RF), Gaussian process (GP), and PLSR to optimize high-throughput analysis of the two photosynthetic variables. Six tobacco genotypes, including both transgenic and wild-type lines, with a range of photosynthetic capacities were used to test the framework. Leaf reflectance spectra were measured from 400 to 2500 nm using a high-spectral-resolution spectroradiometer. Corresponding photosynthesis vs. intercellular CO2 concentration response curves were measured for each leaf using a leaf gas-exchange system. Results suggested that the mean R2 value of the six regression techniques for predicting Vc,max (Jmax) ranged from 0.60 (0.45) to 0.65 (0.56) with the mean RMSE value varying from 47.1 (40.1) to 54.0 (44.7) μmol m-2 s-1. Regression stacking for Vc,max (Jmax) performed better than the individual regression techniques with increases in R2 of 0.1 (0.08) and decreases in RMSE by 4.1 (6.6) μmol m-2 s-1, equal to 8% (15%) reduction in RMSE. Better predictive performance of the regression stacking is likely attributed to the varying coefficients (or weights) in the level-2 model (the LASSO model) and the diverse ability of each individual regression technique to utilize spectral information for the best modeling performance. Further refinements can be made to apply this stacked regression technique to other plant phenotypic traits.

Original languageEnglish (US)
Article number730
JournalFrontiers in Plant Science
Volume10
DOIs
StatePublished - May 31 2019

Fingerprint

artificial intelligence
reflectance
least squares
operator regions
leaves
shrinkage
spectroradiometers
methodology
energy conversion
carboxylation
ribulose-bisphosphate carboxylase
neural networks
electron transfer
gas exchange
tobacco
climate change
photosynthesis
genetically modified organisms
agriculture
uptake mechanisms

Keywords

  • Gas exchange system
  • High-throughput phenotyping
  • Machine learning
  • Photosynthesis
  • Stacked regression

ASJC Scopus subject areas

  • Plant Science

Cite this

@article{927983ce3fcc4ee281baa17d2df1efad,
title = "Hyperspectral leaf reflectance as proxy for photosynthetic capacities: An ensemble approach based on multiple machine learning algorithms",
abstract = "Global agriculture production is challenged by increasing demands from rising population and a changing climate, which may be alleviated through development of genetically improved crop cultivars. Research into increasing photosynthetic energy conversion efficiency has proposed many strategies to improve production but have yet to yield real-world solutions, largely because of a phenotyping bottleneck. Partial least squares regression (PLSR) is a statistical technique that is increasingly used to relate hyperspectral reflectance to key photosynthetic capacities associated with carbon uptake (maximum carboxylation rate of Rubisco, Vc,max) and conversion of light energy (maximum electron transport rate supporting RuBP regeneration, Jmax) to alleviate this bottleneck. However, its performance varies significantly across different plant species, regions, and growth environments. Thus, to cope with the heterogeneous performances of PLSR, this study aims to develop a new approach to estimate photosynthetic capacities. A framework was developed that combines six machine learning algorithms, including artificial neural network (ANN), support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), random forest (RF), Gaussian process (GP), and PLSR to optimize high-throughput analysis of the two photosynthetic variables. Six tobacco genotypes, including both transgenic and wild-type lines, with a range of photosynthetic capacities were used to test the framework. Leaf reflectance spectra were measured from 400 to 2500 nm using a high-spectral-resolution spectroradiometer. Corresponding photosynthesis vs. intercellular CO2 concentration response curves were measured for each leaf using a leaf gas-exchange system. Results suggested that the mean R2 value of the six regression techniques for predicting Vc,max (Jmax) ranged from 0.60 (0.45) to 0.65 (0.56) with the mean RMSE value varying from 47.1 (40.1) to 54.0 (44.7) μmol m-2 s-1. Regression stacking for Vc,max (Jmax) performed better than the individual regression techniques with increases in R2 of 0.1 (0.08) and decreases in RMSE by 4.1 (6.6) μmol m-2 s-1, equal to 8{\%} (15{\%}) reduction in RMSE. Better predictive performance of the regression stacking is likely attributed to the varying coefficients (or weights) in the level-2 model (the LASSO model) and the diverse ability of each individual regression technique to utilize spectral information for the best modeling performance. Further refinements can be made to apply this stacked regression technique to other plant phenotypic traits.",
keywords = "Gas exchange system, High-throughput phenotyping, Machine learning, Photosynthesis, Stacked regression",
author = "Peng Fu and Katherine Meacham-Hensold and Kaiyu Guan and Carl Bernacchi",
year = "2019",
month = "5",
day = "31",
doi = "10.3389/fpls.2019.00730",
language = "English (US)",
volume = "10",
journal = "Frontiers in Plant Science",
issn = "1664-462X",
publisher = "Frontiers Media S. A.",

}

TY - JOUR

T1 - Hyperspectral leaf reflectance as proxy for photosynthetic capacities

T2 - An ensemble approach based on multiple machine learning algorithms

AU - Fu, Peng

AU - Meacham-Hensold, Katherine

AU - Guan, Kaiyu

AU - Bernacchi, Carl

PY - 2019/5/31

Y1 - 2019/5/31

N2 - Global agriculture production is challenged by increasing demands from rising population and a changing climate, which may be alleviated through development of genetically improved crop cultivars. Research into increasing photosynthetic energy conversion efficiency has proposed many strategies to improve production but have yet to yield real-world solutions, largely because of a phenotyping bottleneck. Partial least squares regression (PLSR) is a statistical technique that is increasingly used to relate hyperspectral reflectance to key photosynthetic capacities associated with carbon uptake (maximum carboxylation rate of Rubisco, Vc,max) and conversion of light energy (maximum electron transport rate supporting RuBP regeneration, Jmax) to alleviate this bottleneck. However, its performance varies significantly across different plant species, regions, and growth environments. Thus, to cope with the heterogeneous performances of PLSR, this study aims to develop a new approach to estimate photosynthetic capacities. A framework was developed that combines six machine learning algorithms, including artificial neural network (ANN), support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), random forest (RF), Gaussian process (GP), and PLSR to optimize high-throughput analysis of the two photosynthetic variables. Six tobacco genotypes, including both transgenic and wild-type lines, with a range of photosynthetic capacities were used to test the framework. Leaf reflectance spectra were measured from 400 to 2500 nm using a high-spectral-resolution spectroradiometer. Corresponding photosynthesis vs. intercellular CO2 concentration response curves were measured for each leaf using a leaf gas-exchange system. Results suggested that the mean R2 value of the six regression techniques for predicting Vc,max (Jmax) ranged from 0.60 (0.45) to 0.65 (0.56) with the mean RMSE value varying from 47.1 (40.1) to 54.0 (44.7) μmol m-2 s-1. Regression stacking for Vc,max (Jmax) performed better than the individual regression techniques with increases in R2 of 0.1 (0.08) and decreases in RMSE by 4.1 (6.6) μmol m-2 s-1, equal to 8% (15%) reduction in RMSE. Better predictive performance of the regression stacking is likely attributed to the varying coefficients (or weights) in the level-2 model (the LASSO model) and the diverse ability of each individual regression technique to utilize spectral information for the best modeling performance. Further refinements can be made to apply this stacked regression technique to other plant phenotypic traits.

AB - Global agriculture production is challenged by increasing demands from rising population and a changing climate, which may be alleviated through development of genetically improved crop cultivars. Research into increasing photosynthetic energy conversion efficiency has proposed many strategies to improve production but have yet to yield real-world solutions, largely because of a phenotyping bottleneck. Partial least squares regression (PLSR) is a statistical technique that is increasingly used to relate hyperspectral reflectance to key photosynthetic capacities associated with carbon uptake (maximum carboxylation rate of Rubisco, Vc,max) and conversion of light energy (maximum electron transport rate supporting RuBP regeneration, Jmax) to alleviate this bottleneck. However, its performance varies significantly across different plant species, regions, and growth environments. Thus, to cope with the heterogeneous performances of PLSR, this study aims to develop a new approach to estimate photosynthetic capacities. A framework was developed that combines six machine learning algorithms, including artificial neural network (ANN), support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), random forest (RF), Gaussian process (GP), and PLSR to optimize high-throughput analysis of the two photosynthetic variables. Six tobacco genotypes, including both transgenic and wild-type lines, with a range of photosynthetic capacities were used to test the framework. Leaf reflectance spectra were measured from 400 to 2500 nm using a high-spectral-resolution spectroradiometer. Corresponding photosynthesis vs. intercellular CO2 concentration response curves were measured for each leaf using a leaf gas-exchange system. Results suggested that the mean R2 value of the six regression techniques for predicting Vc,max (Jmax) ranged from 0.60 (0.45) to 0.65 (0.56) with the mean RMSE value varying from 47.1 (40.1) to 54.0 (44.7) μmol m-2 s-1. Regression stacking for Vc,max (Jmax) performed better than the individual regression techniques with increases in R2 of 0.1 (0.08) and decreases in RMSE by 4.1 (6.6) μmol m-2 s-1, equal to 8% (15%) reduction in RMSE. Better predictive performance of the regression stacking is likely attributed to the varying coefficients (or weights) in the level-2 model (the LASSO model) and the diverse ability of each individual regression technique to utilize spectral information for the best modeling performance. Further refinements can be made to apply this stacked regression technique to other plant phenotypic traits.

KW - Gas exchange system

KW - High-throughput phenotyping

KW - Machine learning

KW - Photosynthesis

KW - Stacked regression

UR - http://www.scopus.com/inward/record.url?scp=85068439754&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068439754&partnerID=8YFLogxK

U2 - 10.3389/fpls.2019.00730

DO - 10.3389/fpls.2019.00730

M3 - Article

AN - SCOPUS:85068439754

VL - 10

JO - Frontiers in Plant Science

JF - Frontiers in Plant Science

SN - 1664-462X

M1 - 730

ER -