Next-generation DNA sequencing (NGS) produces vast amounts of DNA sequence data, but it is not specifically designed to generate data suitable for genetic mapping. Recently developed DNA library preparation methods for NGS have helped solve this problem, however, by combining the use of reduced representation libraries with DNA sample barcoding to generate genome-wide genotype data from a common set of genetic markers across a large number of samples. Here we use such a method, called genotyping-by-sequencing (GBS), to produce a data set for genetic mapping in an F1 population of apples (Malus × domestica) segregating for skin color. We show that GBS produces a relatively large, but extremely sparse, genotype matrix: over 270,000 SNPs were discovered but most SNPs have too much missing data across samples to be useful for genetic mapping. After filtering for genotype quality and missing data, only 6% of the 85 million DNA sequence reads contributed to useful genotype calls. Despite this limitation, using existing software and a set of simple heuristics, we generated a final genotype matrix containing 3967 SNPs from 89 DNA samples from a single lane of Illumina HiSeq and used it to create a saturated genetic linkage map and to identify a known QTL underlying apple skin color. We therefore demonstrate that GBS is a costeffective method for generating genome-wide SNP data suitable for genetic mapping in a highly diverse and heterozygous agricultural species. We anticipate future improvements to the GBS analysis pipeline presented here that will enhance the utility of next-generation DNA sequence data for the purposes of genetic mapping across diverse species.
ASJC Scopus subject areas
- Molecular Biology