Efficient use of historical data for genomic selection: A case study of stem rust resistance in wheat

J. Rutkoski, R. P. Singh, J. Huerta-Espino, S. Bhavani, J. Poland, J. L. Jannink, M. E. Sorrells

Research output: Contribution to journalArticle

Abstract

Genomic selection (GS) is a methodology that can improve crop breeding efficiency. To implement GS, a training population (TP) with phenotypic and genotypic data is required to train a statistical model used to predict genotyped selection candidates (SCs). A key factor impacting prediction accuracy is the relationship between the TP and the SCs. This study used empirical data for quantitative adult plant resistance to stem rust of wheat (Triticum aestivum L.) to investigate the utility of a historical TP (TPH) compared with a population-specific TP (TPPS), the potential for TPH optimization, and the utility of TPH data when close relative data is available for training. We found that, depending on the population size, a TPPS was 1.5 to 4.4 times more accurate than a TPH, and TPH optimization based on the mean of the generalized coefficient of determination or prediction error variance enabled the selection of subsets that led to significantly higher accuracy than randomly selected subsets. Retaining historical data when data on close relatives were available lead to a 11.9% increase in accuracy, at best, and a 12% decrease in accuracy, at worst, depending on the heritability. We conclude that historical data could be used successfully to initiate a GS program, especially if the dataset is very large and of high heritability. Training population optimization would be useful for the identification of TPH subsets to phenotype additional traits. However, after model updating, discarding historical data may be warranted. More studies are needed to determine if these observations represent general trends.

Original languageEnglish (US)
Pages (from-to)1-10
Number of pages10
JournalPlant Genome
Volume8
Issue number1
DOIs
StatePublished - Jan 1 2015
Externally publishedYes

Fingerprint

stem rust
marker-assisted selection
Triticum
case studies
wheat
Population
heritability
prediction
Statistical Models
Population Density
mature plants
plant breeding
statistical models
Breeding
population size
Triticum aestivum
Phenotype
phenotype

ASJC Scopus subject areas

  • Genetics
  • Agronomy and Crop Science
  • Plant Science

Cite this

Rutkoski, J., Singh, R. P., Huerta-Espino, J., Bhavani, S., Poland, J., Jannink, J. L., & Sorrells, M. E. (2015). Efficient use of historical data for genomic selection: A case study of stem rust resistance in wheat. Plant Genome, 8(1), 1-10. https://doi.org/10.3835/plantgenome2014.09.0046

Efficient use of historical data for genomic selection : A case study of stem rust resistance in wheat. / Rutkoski, J.; Singh, R. P.; Huerta-Espino, J.; Bhavani, S.; Poland, J.; Jannink, J. L.; Sorrells, M. E.

In: Plant Genome, Vol. 8, No. 1, 01.01.2015, p. 1-10.

Research output: Contribution to journalArticle

Rutkoski, J, Singh, RP, Huerta-Espino, J, Bhavani, S, Poland, J, Jannink, JL & Sorrells, ME 2015, 'Efficient use of historical data for genomic selection: A case study of stem rust resistance in wheat', Plant Genome, vol. 8, no. 1, pp. 1-10. https://doi.org/10.3835/plantgenome2014.09.0046
Rutkoski, J. ; Singh, R. P. ; Huerta-Espino, J. ; Bhavani, S. ; Poland, J. ; Jannink, J. L. ; Sorrells, M. E. / Efficient use of historical data for genomic selection : A case study of stem rust resistance in wheat. In: Plant Genome. 2015 ; Vol. 8, No. 1. pp. 1-10.
@article{2b66c35cfe3549d5806f3d75605dfaee,
title = "Efficient use of historical data for genomic selection: A case study of stem rust resistance in wheat",
abstract = "Genomic selection (GS) is a methodology that can improve crop breeding efficiency. To implement GS, a training population (TP) with phenotypic and genotypic data is required to train a statistical model used to predict genotyped selection candidates (SCs). A key factor impacting prediction accuracy is the relationship between the TP and the SCs. This study used empirical data for quantitative adult plant resistance to stem rust of wheat (Triticum aestivum L.) to investigate the utility of a historical TP (TPH) compared with a population-specific TP (TPPS), the potential for TPH optimization, and the utility of TPH data when close relative data is available for training. We found that, depending on the population size, a TPPS was 1.5 to 4.4 times more accurate than a TPH, and TPH optimization based on the mean of the generalized coefficient of determination or prediction error variance enabled the selection of subsets that led to significantly higher accuracy than randomly selected subsets. Retaining historical data when data on close relatives were available lead to a 11.9{\%} increase in accuracy, at best, and a 12{\%} decrease in accuracy, at worst, depending on the heritability. We conclude that historical data could be used successfully to initiate a GS program, especially if the dataset is very large and of high heritability. Training population optimization would be useful for the identification of TPH subsets to phenotype additional traits. However, after model updating, discarding historical data may be warranted. More studies are needed to determine if these observations represent general trends.",
author = "J. Rutkoski and Singh, {R. P.} and J. Huerta-Espino and S. Bhavani and J. Poland and Jannink, {J. L.} and Sorrells, {M. E.}",
year = "2015",
month = "1",
day = "1",
doi = "10.3835/plantgenome2014.09.0046",
language = "English (US)",
volume = "8",
pages = "1--10",
journal = "Plant Genome",
issn = "1940-3372",
publisher = "Crop Science Society of America",
number = "1",

}

TY - JOUR

T1 - Efficient use of historical data for genomic selection

T2 - A case study of stem rust resistance in wheat

AU - Rutkoski, J.

AU - Singh, R. P.

AU - Huerta-Espino, J.

AU - Bhavani, S.

AU - Poland, J.

AU - Jannink, J. L.

AU - Sorrells, M. E.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Genomic selection (GS) is a methodology that can improve crop breeding efficiency. To implement GS, a training population (TP) with phenotypic and genotypic data is required to train a statistical model used to predict genotyped selection candidates (SCs). A key factor impacting prediction accuracy is the relationship between the TP and the SCs. This study used empirical data for quantitative adult plant resistance to stem rust of wheat (Triticum aestivum L.) to investigate the utility of a historical TP (TPH) compared with a population-specific TP (TPPS), the potential for TPH optimization, and the utility of TPH data when close relative data is available for training. We found that, depending on the population size, a TPPS was 1.5 to 4.4 times more accurate than a TPH, and TPH optimization based on the mean of the generalized coefficient of determination or prediction error variance enabled the selection of subsets that led to significantly higher accuracy than randomly selected subsets. Retaining historical data when data on close relatives were available lead to a 11.9% increase in accuracy, at best, and a 12% decrease in accuracy, at worst, depending on the heritability. We conclude that historical data could be used successfully to initiate a GS program, especially if the dataset is very large and of high heritability. Training population optimization would be useful for the identification of TPH subsets to phenotype additional traits. However, after model updating, discarding historical data may be warranted. More studies are needed to determine if these observations represent general trends.

AB - Genomic selection (GS) is a methodology that can improve crop breeding efficiency. To implement GS, a training population (TP) with phenotypic and genotypic data is required to train a statistical model used to predict genotyped selection candidates (SCs). A key factor impacting prediction accuracy is the relationship between the TP and the SCs. This study used empirical data for quantitative adult plant resistance to stem rust of wheat (Triticum aestivum L.) to investigate the utility of a historical TP (TPH) compared with a population-specific TP (TPPS), the potential for TPH optimization, and the utility of TPH data when close relative data is available for training. We found that, depending on the population size, a TPPS was 1.5 to 4.4 times more accurate than a TPH, and TPH optimization based on the mean of the generalized coefficient of determination or prediction error variance enabled the selection of subsets that led to significantly higher accuracy than randomly selected subsets. Retaining historical data when data on close relatives were available lead to a 11.9% increase in accuracy, at best, and a 12% decrease in accuracy, at worst, depending on the heritability. We conclude that historical data could be used successfully to initiate a GS program, especially if the dataset is very large and of high heritability. Training population optimization would be useful for the identification of TPH subsets to phenotype additional traits. However, after model updating, discarding historical data may be warranted. More studies are needed to determine if these observations represent general trends.

UR - http://www.scopus.com/inward/record.url?scp=84925933750&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925933750&partnerID=8YFLogxK

U2 - 10.3835/plantgenome2014.09.0046

DO - 10.3835/plantgenome2014.09.0046

M3 - Article

AN - SCOPUS:84925933750

VL - 8

SP - 1

EP - 10

JO - Plant Genome

JF - Plant Genome

SN - 1940-3372

IS - 1

ER -