Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki

Niraj Rayamajhi, Chi-Hing Christina Cheng, Julian M Catchen

Research output: Contribution to journalArticlepeer-review

Abstract

For any genome-based research, a robust genome assembly is required. De novo assembly strategies have evolved with changes in DNA sequencing technologies and have been through at least 3 phases: (1) short-read only, (2) short- and long-read hybrid, and (3) long-read only assemblies. Each of the phases has its own error model. We hypothesized that hidden short-read scaffolding errors and erroneous long-read contigs degrade the quality of short- and long-read hybrid assemblies. We assembled the genome of Trematomus borchgrevinki from data generated during each of the 3 phases and assessed the quality problems we encountered. We developed strategies such as k-mer-assembled region replacement, parameter optimization, and long-read sampling to address the error models. We demonstrated that a k-mer-based strategy improved short-read assemblies as measured by Benchmarking Universal Single-Copy Ortholog while mate-pair libraries introduced hidden scaffolding errors and perturbed Benchmarking Universal Single-Copy Ortholog scores. Furthermore, we found that although hybrid assemblies can generate higher contiguity they tend to suffer from lower quality. In addition, we found long-read-only assemblies can be optimized for contiguity by subsampling length-restricted raw reads. Our results indicate that long-read contig assembly is the current best choice and that assemblies from phase I and phase II were of lower quality.
Original languageEnglish (US)
Article numberjkac192
JournalG3 Genes|Genomes|Genetics
Volume12
Issue number11
Early online dateJul 29 2022
DOIs
StatePublished - Nov 4 2022

Keywords

  • genome assembly
  • notothenioids
  • long-read assembly
  • short-read assembly
  • k-mer analysis

ASJC Scopus subject areas

  • Genetics(clinical)
  • Genetics
  • Molecular Biology

Fingerprint

Dive into the research topics of 'Evaluating Illumina-, Nanopore-, and PacBio-based genome assembly strategies with the bald notothen, Trematomus borchgrevinki'. Together they form a unique fingerprint.

Cite this