TY - JOUR
T1 - Improving precision and accuracy of genetic mapping with genotyping-by-sequencing data in outcrossing species
AU - LaBonte, Nicholas R.
AU - Zerpa-Catanho, Dessireé P.
AU - Liu, Siyao
AU - Xiao, Liang
AU - Dong, Hongxu
AU - Clark, Lindsay V.
AU - Sacks, Erik J.
N1 - Publisher Copyright:
© 2024 The Author(s). GCB Bioenergy published by John Wiley & Sons Ltd.
PY - 2024/7
Y1 - 2024/7
N2 - Genotyping-by-sequencing (GBS) is a widely used strategy for obtaining large numbers of genetic markers in model and non-model organisms. In crop plants, GBS-derived marker datasets are frequently used to perform quantitative trait locus (QTL) mapping. In some plant species, however, high heterozygosity and complex genome structure mean that researchers must use care in handling GBS data to conduct QTL mapping most effectively. Such outbred crops include most of the perennial grass and tree species used for bioenergy. To identify strategies for increasing accuracy and precision of QTL mapping using GBS data in outbred crops, we conducted an empirical study of SNP-calling and genetic map-building pipeline parameters in a Miscanthus sinensis population, and a complementary simulation study to estimate the relationship between genome-wide error rate, read depth, and marker number. The bioenergy grass Miscanthus is an obligate outcrossing species with a recent (diploidized) whole-genome duplication. For the study of empirical M. sinensis data, we compared two SNP-calling methods (one non-reference-based and one reference-based), a series of depth filters (12×, 20×, 30×, and 40×) and two map-construction methods (i.e., marker ordering: linkage-only and order-corrected based on a reference genome). We found that correcting the order of markers on a linkage map by using a high-quality reference genome improved QTL precision (shorter confidence intervals). For typical GBS datasets of between 1000 and 5000 markers to build a genetic map for biparental populations, a depth filter set at 30× to 40× applied to outbred populations provided a genome-wide genotype-calling error rate of less than 1%, improved accuracy of QTL point estimates and minimized type I errors for identifying QTL. Based on these results, we recommend using a reference genome to correct the marker order of genetic maps and a robust genotype depth filter to improve QTL mapping for outbred crops.
AB - Genotyping-by-sequencing (GBS) is a widely used strategy for obtaining large numbers of genetic markers in model and non-model organisms. In crop plants, GBS-derived marker datasets are frequently used to perform quantitative trait locus (QTL) mapping. In some plant species, however, high heterozygosity and complex genome structure mean that researchers must use care in handling GBS data to conduct QTL mapping most effectively. Such outbred crops include most of the perennial grass and tree species used for bioenergy. To identify strategies for increasing accuracy and precision of QTL mapping using GBS data in outbred crops, we conducted an empirical study of SNP-calling and genetic map-building pipeline parameters in a Miscanthus sinensis population, and a complementary simulation study to estimate the relationship between genome-wide error rate, read depth, and marker number. The bioenergy grass Miscanthus is an obligate outcrossing species with a recent (diploidized) whole-genome duplication. For the study of empirical M. sinensis data, we compared two SNP-calling methods (one non-reference-based and one reference-based), a series of depth filters (12×, 20×, 30×, and 40×) and two map-construction methods (i.e., marker ordering: linkage-only and order-corrected based on a reference genome). We found that correcting the order of markers on a linkage map by using a high-quality reference genome improved QTL precision (shorter confidence intervals). For typical GBS datasets of between 1000 and 5000 markers to build a genetic map for biparental populations, a depth filter set at 30× to 40× applied to outbred populations provided a genome-wide genotype-calling error rate of less than 1%, improved accuracy of QTL point estimates and minimized type I errors for identifying QTL. Based on these results, we recommend using a reference genome to correct the marker order of genetic maps and a robust genotype depth filter to improve QTL mapping for outbred crops.
KW - biparental cross
KW - experiment-wide error
KW - heterozygote undercalling
KW - linkage map
KW - outbred
KW - QTL
UR - http://www.scopus.com/inward/record.url?scp=85195402374&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85195402374&partnerID=8YFLogxK
U2 - 10.1111/gcbb.13167
DO - 10.1111/gcbb.13167
M3 - Article
AN - SCOPUS:85195402374
SN - 1757-1693
VL - 16
JO - GCB Bioenergy
JF - GCB Bioenergy
IS - 7
M1 - e13167
ER -