TY - JOUR
T1 - Nanopore sequencing technology and tools for genome assembly
T2 - Computational analysis of the current state, bottlenecks and future directions
AU - Senol Cali, Damla
AU - Kim, Jeremie S.
AU - Ghose, Saugata
AU - Alkan, Can
AU - Mutlu, Onur
N1 - Funding Information:
This work was supported by a grant from the National Institutes of Health to O.M. and C.A. (grant number HG006004); an installation grant from the European Molecular Biology Organization to C.A. (grant number EMBO-IG 2521); and gifts from Google, Intel, Samsung and VMware.
Publisher Copyright:
© 2018 The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected].
Copyright:
Copyright 2019 Elsevier B.V., All rights reserved.
PY - 2018/3/27
Y1 - 2018/3/27
N2 - Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.
AB - Nanopore sequencing technology has the potential to render other sequencing technologies obsolete with its ability to generate long reads and provide portability. However, high error rates of the technology pose a challenge while generating accurate genome assemblies. The tools used for nanopore sequence analysis are of critical importance, as they should overcome the high error rates of the technology. Our goal in this work is to comprehensively analyze current publicly available tools for nanopore sequence analysis to understand their advantages, disadvantages and performance bottlenecks. It is important to understand where the current tools do not perform well to develop better tools. To this end, we (1) analyze the multiple steps and the associated tools in the genome assembly pipeline using nanopore sequence data, and (2) provide guidelines for determining the appropriate tools for each step. Based on our analyses, we make four key observations: (1) the choice of the tool for basecalling plays a critical role in overcoming the high error rates of nanopore sequencing technology. (2) Read-to-read overlap finding tools, GraphMap and Minimap, perform similarly in terms of accuracy. However, Minimap has a lower memory usage, and it is faster than GraphMap. (3) There is a trade-off between accuracy and performance when deciding on the appropriate tool for the assembly step. The fast but less accurate assembler Miniasm can be used for quick initial assembly, and further polishing can be applied on top of it to increase the accuracy, which leads to faster overall assembly. (4) The state-of-the-art polishing tool, Racon, generates high-quality consensus sequences while providing a significant speedup over another polishing tool, Nanopolish. We analyze various combinations of different tools and expose the trade-offs between accuracy, performance, memory usage and scalability. We conclude that our observations can guide researchers and practitioners in making conscious and effective choices for each step of the genome assembly pipeline using nanopore sequence data. Also, with the help of bottlenecks we have found, developers can improve the current tools or build new ones that are both accurate and fast, to overcome the high error rates of the nanopore sequencing technology.
KW - assembly
KW - genome analysis
KW - genome sequencing
KW - mapping
KW - nanopore sequencing
UR - http://www.scopus.com/inward/record.url?scp=85072947405&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072947405&partnerID=8YFLogxK
U2 - 10.1093/bib/bby017
DO - 10.1093/bib/bby017
M3 - Article
C2 - 29617724
AN - SCOPUS:85072947405
SN - 1467-5463
VL - 20
SP - 1542
EP - 1559
JO - Briefings in bioinformatics
JF - Briefings in bioinformatics
IS - 4
ER -