TY - JOUR
T1 - Simulation with RADinitio improves RADseq experimental design and sheds light on sources of missing data
AU - Rivera-Colón, Angel G.
AU - Rochette, Nicolas C.
AU - Catchen, Julian M.
N1 - The authors would like to thank José Cerca for coming up with the RADinitio name and for his assessment of the Stacks mailing list, Matt Streisfeld for early access to the reference assembly, and Yoel Stuart and Daniel Bolnick for the stickleback ddRAD data. We would like to thank Peter Ralph for his assistance with pyslim tskit and msprime . We also want to thank Niraj Rayamajhi, Jan Stefka, Eric Normandeau for early testing of the software. AGR and NCR were supported by NSF grant 1645087. M. aurantiacus , ,
PY - 2021/2
Y1 - 2021/2
N2 - Restriction-site associated DNA sequencing (RADseq) has become a powerful and versatile tool in modern population genomics, enabling large-scale evolutionary and genomic analyses in otherwise inaccessible biological systems. With its widespread use, different variants on the protocol have been developed to suit specific experimental needs. Researchers face the challenge of choosing the optimal molecular and sequencing protocols for their reduced representation experimental design, an often-complicated process. Strategic errors can lead to biased data generation that has reduced power to answer biological questions. Here, we present RADinitio, simulation software for the selection and optimization of RADseq experiments via the generation of sequencing data that behave similarly to empirical sources. RADinitio provides an evolutionary simulation of populations, implementation of various RADseq protocols with customizable parameters, and thorough assessment of missing data. We test the efficacy of the software using different RAD protocols across several organisms, highlighting the importance of protocol selection on the magnitude and quality of data acquired. Additionally, we test the effects of RAD library preparation and sequencing on allelic dropout, observing that library preparation and sequencing often contributes more to missing alleles than population-level variation.
AB - Restriction-site associated DNA sequencing (RADseq) has become a powerful and versatile tool in modern population genomics, enabling large-scale evolutionary and genomic analyses in otherwise inaccessible biological systems. With its widespread use, different variants on the protocol have been developed to suit specific experimental needs. Researchers face the challenge of choosing the optimal molecular and sequencing protocols for their reduced representation experimental design, an often-complicated process. Strategic errors can lead to biased data generation that has reduced power to answer biological questions. Here, we present RADinitio, simulation software for the selection and optimization of RADseq experiments via the generation of sequencing data that behave similarly to empirical sources. RADinitio provides an evolutionary simulation of populations, implementation of various RADseq protocols with customizable parameters, and thorough assessment of missing data. We test the efficacy of the software using different RAD protocols across several organisms, highlighting the importance of protocol selection on the magnitude and quality of data acquired. Additionally, we test the effects of RAD library preparation and sequencing on allelic dropout, observing that library preparation and sequencing often contributes more to missing alleles than population-level variation.
KW - RADseq
KW - bioinformatics
KW - genetics
KW - population
KW - simulations
UR - http://www.scopus.com/inward/record.url?scp=85084987161&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084987161&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.13163
DO - 10.1111/1755-0998.13163
M3 - Article
C2 - 32275349
AN - SCOPUS:85084987161
SN - 1755-098X
VL - 21
SP - 363
EP - 378
JO - Molecular ecology resources
JF - Molecular ecology resources
IS - 2
ER -