Abstract
Restriction site-Associated DNA sequencing (RAD-seq) has become a widely adopted method for genotyping populations of model and non-model organisms. Generating a reliable set of loci for downstream analysis requires appropriate use of bioinformatics software, such as the program stacks. Using three empirical RAD-seq datasets, we demonstrate a method for optimising a de novo assembly of loci using stacks. By iterating values of the program's main parameters and plotting resultant core metrics for visualisation, researchers can gain a much better understanding of their dataset and select an optimal set of parameters; we present the 80% rule as a generally effective method to select the core parameters for stacks. Visualisation of the metrics plotted for the three RAD-seq datasets shows that they differ in the optimal parameters that should be used to maximise the amount of available biological information. We also demonstrate that building loci de novo and then integrating alignment positions is more effective than aligning raw reads directly to a reference genome. Our methods will help the community in honing the analytical skills necessary to accurately assemble a RAD-seq dataset.
Original language | English (US) |
---|---|
Pages (from-to) | 1360-1373 |
Number of pages | 14 |
Journal | Methods in Ecology and Evolution |
Volume | 8 |
Issue number | 10 |
DOIs | |
State | Published - Oct 2017 |
Keywords
- RAD-seq
- alignment
- de novo assembly
- parameter optimisation
- population genetics
- stacks
ASJC Scopus subject areas
- Ecology, Evolution, Behavior and Systematics
- Ecological Modeling