TY - JOUR
T1 - Genotype imputation for soybean nested association mapping population to improve precision of QTL detection
AU - Chen, Linfeng
AU - Yang, Shouping
AU - Araya, Susan
AU - Quigley, Charles
AU - Taliercio, Earl
AU - Mian, Rouf
AU - Specht, James E.
AU - Diers, Brian W.
AU - Song, Qijian
N1 - Publisher Copyright:
© 2022, This is a U.S. government work and not under copyright protection in the U.S.; foreign copyright protection may apply.
PY - 2022/5
Y1 - 2022/5
N2 - Key message: Software for high imputation accuracy in soybean was identified. Imputed dataset could significantly reduce the interval of genomic regions controlling traits, thus greatly improve the efficiency of candidate gene identification. Abstract: Genotype imputation is a strategy to increase marker density of existing datasets without additional genotyping. We compared imputation performance of software BEAGLE 5.0, IMPUTE 5 and AlphaPlantImpute and tested software parameters that may help to improve imputation accuracy in soybean populations. Several factors including marker density, extent of linkage disequilibrium (LD), minor allele frequency (MAF), etc., were examined for their effects on imputation accuracy across different software. Our results showed that AlphaPlantImpute had a higher imputation accuracy than BEAGLE 5.0 or IMPUTE 5 tested in each soybean family, especially if the study progeny were genotyped with an extremely low number of markers. LD extent, MAF and reference panel size were positively correlated with imputation accuracy, a minimum number of 50 markers per chromosome and MAF of SNPs > 0.2 in soybean line were required to avoid a significant loss of imputation accuracy. Using the software, we imputed 5176 soybean lines in the soybean nested mapping population (NAM) with high-density markers of the 40 parents. The dataset containing 423,419 markers for 5176 lines and 40 parents was deposited at the Soybase. The imputed NAM dataset was further examined for the improvement of mapping quantitative trait loci (QTL) controlling soybean seed protein content. Most of the QTL identified were at identical or at similar position based on initial and imputed datasets; however, QTL intervals were greatly narrowed. The resulting genotypic dataset of NAM population will facilitate QTL mapping of traits and downstream applications. The information will also help to improve genotyping imputation accuracy in self-pollinated crops.
AB - Key message: Software for high imputation accuracy in soybean was identified. Imputed dataset could significantly reduce the interval of genomic regions controlling traits, thus greatly improve the efficiency of candidate gene identification. Abstract: Genotype imputation is a strategy to increase marker density of existing datasets without additional genotyping. We compared imputation performance of software BEAGLE 5.0, IMPUTE 5 and AlphaPlantImpute and tested software parameters that may help to improve imputation accuracy in soybean populations. Several factors including marker density, extent of linkage disequilibrium (LD), minor allele frequency (MAF), etc., were examined for their effects on imputation accuracy across different software. Our results showed that AlphaPlantImpute had a higher imputation accuracy than BEAGLE 5.0 or IMPUTE 5 tested in each soybean family, especially if the study progeny were genotyped with an extremely low number of markers. LD extent, MAF and reference panel size were positively correlated with imputation accuracy, a minimum number of 50 markers per chromosome and MAF of SNPs > 0.2 in soybean line were required to avoid a significant loss of imputation accuracy. Using the software, we imputed 5176 soybean lines in the soybean nested mapping population (NAM) with high-density markers of the 40 parents. The dataset containing 423,419 markers for 5176 lines and 40 parents was deposited at the Soybase. The imputed NAM dataset was further examined for the improvement of mapping quantitative trait loci (QTL) controlling soybean seed protein content. Most of the QTL identified were at identical or at similar position based on initial and imputed datasets; however, QTL intervals were greatly narrowed. The resulting genotypic dataset of NAM population will facilitate QTL mapping of traits and downstream applications. The information will also help to improve genotyping imputation accuracy in self-pollinated crops.
UR - http://www.scopus.com/inward/record.url?scp=85126117912&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126117912&partnerID=8YFLogxK
U2 - 10.1007/s00122-022-04070-7
DO - 10.1007/s00122-022-04070-7
M3 - Article
C2 - 35275252
AN - SCOPUS:85126117912
SN - 0040-5752
VL - 135
SP - 1797
EP - 1810
JO - Theoretical and Applied Genetics
JF - Theoretical and Applied Genetics
IS - 5
ER -