TY - GEN
T1 - Overlap-based genome assembly from variable-length reads
AU - Hui, Joseph
AU - Shomorony, Ilan
AU - Ramchandran, Kannan
AU - Courtade, Thomas A.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/8/10
Y1 - 2016/8/10
N2 - Recently developed high-throughput sequencing platforms can generate very long reads, making the perfect assembly of whole genomes information-theoretically possible [1]. One of the challenges in achieving this goal in practice, however, is that traditional assembly algorithms based on the de Bruijn graph framework cannot handle the high error rates of long-read technologies. On the other hand, overlap-based approaches such as string graphs [2] are very robust to errors, but cannot achieve the theoretical lower bounds. In particular, these methods handle the variable-length reads provided by long-read technologies in a suboptimal manner. In this work, we introduce a new assembly algorithm with two desirable features in the context of long-read sequencing: (1) it is an overlap-based method, thus being more resilient to read errors than de Bruijn graph approaches; and (2) it achieves the information-theoretic bounds even in the variable-length read setting.
AB - Recently developed high-throughput sequencing platforms can generate very long reads, making the perfect assembly of whole genomes information-theoretically possible [1]. One of the challenges in achieving this goal in practice, however, is that traditional assembly algorithms based on the de Bruijn graph framework cannot handle the high error rates of long-read technologies. On the other hand, overlap-based approaches such as string graphs [2] are very robust to errors, but cannot achieve the theoretical lower bounds. In particular, these methods handle the variable-length reads provided by long-read technologies in a suboptimal manner. In this work, we introduce a new assembly algorithm with two desirable features in the context of long-read sequencing: (1) it is an overlap-based method, thus being more resilient to read errors than de Bruijn graph approaches; and (2) it achieves the information-theoretic bounds even in the variable-length read setting.
UR - http://www.scopus.com/inward/record.url?scp=84985953133&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84985953133&partnerID=8YFLogxK
U2 - 10.1109/ISIT.2016.7541453
DO - 10.1109/ISIT.2016.7541453
M3 - Conference contribution
AN - SCOPUS:84985953133
T3 - IEEE International Symposium on Information Theory - Proceedings
SP - 1018
EP - 1022
BT - Proceedings - ISIT 2016; 2016 IEEE International Symposium on Information Theory
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE International Symposium on Information Theory, ISIT 2016
Y2 - 10 July 2016 through 15 July 2016
ER -