A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds

Andreas Wallberg, Ignas Bunikis, Olga Vinnere Pettersson, Mai Britt Mosbech, Anna K. Childers, Jay D. Evans, Alexander S. Mikheyev, Hugh M Robertson, Gene E Robinson, Matthew T. Webster

Research output: Contribution to journalArticle

Abstract

Background: The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map. Results: Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel-HAv3) is significantly more contiguous and complete than the previous one (Amel-4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor > 98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features. Conclusions: The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.

Original languageEnglish (US)
Article number275
JournalBMC genomics
Volume20
Issue number1
DOIs
StatePublished - Apr 8 2019

Fingerprint

Bees
Chromosomes
Chromosome Mapping
Genome
Genomic Structural Variation
Chromosomes, Human, Pair 16
Genetic Linkage
Centromere
Nucleic Acid Repetitive Sequences
Telomere
Chromium
Chromatin
Technology
Genes

Keywords

  • Centromeres
  • Genome assembly
  • Hi-C
  • Linked-read sequencing
  • Optical mapping
  • Single-molecule real-time (SMRT) sequencing
  • Telomeres

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Wallberg, A., Bunikis, I., Pettersson, O. V., Mosbech, M. B., Childers, A. K., Evans, J. D., ... Webster, M. T. (2019). A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds. BMC genomics, 20(1), [275]. https://doi.org/10.1186/s12864-019-5642-0

A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds. / Wallberg, Andreas; Bunikis, Ignas; Pettersson, Olga Vinnere; Mosbech, Mai Britt; Childers, Anna K.; Evans, Jay D.; Mikheyev, Alexander S.; Robertson, Hugh M; Robinson, Gene E; Webster, Matthew T.

In: BMC genomics, Vol. 20, No. 1, 275, 08.04.2019.

Research output: Contribution to journalArticle

Wallberg, A, Bunikis, I, Pettersson, OV, Mosbech, MB, Childers, AK, Evans, JD, Mikheyev, AS, Robertson, HM, Robinson, GE & Webster, MT 2019, 'A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds', BMC genomics, vol. 20, no. 1, 275. https://doi.org/10.1186/s12864-019-5642-0
Wallberg A, Bunikis I, Pettersson OV, Mosbech MB, Childers AK, Evans JD et al. A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds. BMC genomics. 2019 Apr 8;20(1). 275. https://doi.org/10.1186/s12864-019-5642-0
Wallberg, Andreas ; Bunikis, Ignas ; Pettersson, Olga Vinnere ; Mosbech, Mai Britt ; Childers, Anna K. ; Evans, Jay D. ; Mikheyev, Alexander S. ; Robertson, Hugh M ; Robinson, Gene E ; Webster, Matthew T. / A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds. In: BMC genomics. 2019 ; Vol. 20, No. 1.
@article{78fd0cd05d0c47f69454d58aec246ac7,
title = "A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds",
abstract = "Background: The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map. Results: Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel-HAv3) is significantly more contiguous and complete than the previous one (Amel-4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor > 98{\%} of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features. Conclusions: The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.",
keywords = "Centromeres, Genome assembly, Hi-C, Linked-read sequencing, Optical mapping, Single-molecule real-time (SMRT) sequencing, Telomeres",
author = "Andreas Wallberg and Ignas Bunikis and Pettersson, {Olga Vinnere} and Mosbech, {Mai Britt} and Childers, {Anna K.} and Evans, {Jay D.} and Mikheyev, {Alexander S.} and Robertson, {Hugh M} and Robinson, {Gene E} and Webster, {Matthew T.}",
year = "2019",
month = "4",
day = "8",
doi = "10.1186/s12864-019-5642-0",
language = "English (US)",
volume = "20",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - A hybrid de novo genome assembly of the honeybee, Apis mellifera, with chromosome-length scaffolds

AU - Wallberg, Andreas

AU - Bunikis, Ignas

AU - Pettersson, Olga Vinnere

AU - Mosbech, Mai Britt

AU - Childers, Anna K.

AU - Evans, Jay D.

AU - Mikheyev, Alexander S.

AU - Robertson, Hugh M

AU - Robinson, Gene E

AU - Webster, Matthew T.

PY - 2019/4/8

Y1 - 2019/4/8

N2 - Background: The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map. Results: Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel-HAv3) is significantly more contiguous and complete than the previous one (Amel-4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor > 98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features. Conclusions: The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.

AB - Background: The ability to generate long sequencing reads and access long-range linkage information is revolutionizing the quality and completeness of genome assemblies. Here we use a hybrid approach that combines data from four genome sequencing and mapping technologies to generate a new genome assembly of the honeybee Apis mellifera. We first generated contigs based on PacBio sequencing libraries, which were then merged with linked-read 10x Chromium data followed by scaffolding using a BioNano optical genome map and a Hi-C chromatin interaction map, complemented by a genetic linkage map. Results: Each of the assembly steps reduced the number of gaps and incorporated a substantial amount of additional sequence into scaffolds. The new assembly (Amel-HAv3) is significantly more contiguous and complete than the previous one (Amel-4.5), based mainly on Sanger sequencing reads. N50 of contigs is 120-fold higher (5.381 Mbp compared to 0.053 Mbp) and we anchor > 98% of the sequence to chromosomes. All of the 16 chromosomes are represented as single scaffolds with an average of three sequence gaps per chromosome. The improvements are largely due to the inclusion of repetitive sequence that was unplaced in previous assemblies. In particular, our assembly is highly contiguous across centromeres and telomeres and includes hundreds of AvaI and AluI repeats associated with these features. Conclusions: The improved assembly will be of utility for refining gene models, studying genome function, mapping functional genetic variation, identification of structural variants, and comparative genomics.

KW - Centromeres

KW - Genome assembly

KW - Hi-C

KW - Linked-read sequencing

KW - Optical mapping

KW - Single-molecule real-time (SMRT) sequencing

KW - Telomeres

UR - http://www.scopus.com/inward/record.url?scp=85064105277&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85064105277&partnerID=8YFLogxK

U2 - 10.1186/s12864-019-5642-0

DO - 10.1186/s12864-019-5642-0

M3 - Article

VL - 20

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 275

ER -