BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads

Yun Heo, Xiao Long Wu, Deming Chen, Jian Ma, Wen Mei Hwu

Research output: Contribution to journalArticle

Abstract

Motivation: Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers. Results: We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors.

Original languageEnglish (US)
Pages (from-to)1354-1362
Number of pages9
JournalBioinformatics
Volume30
Issue number10
DOIs
StatePublished - May 15 2014

Fingerprint

Bloom Filter
Error correction
Error Correction
Sequencing
High Throughput
Throughput
Data storage equipment
Genomics
Genome
Technology
DNA
False Positive
Genes
Evaluation

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Cite this

BLESS : Bloom filter-based error correction solution for high-throughput sequencing reads. / Heo, Yun; Wu, Xiao Long; Chen, Deming; Ma, Jian; Hwu, Wen Mei.

In: Bioinformatics, Vol. 30, No. 10, 15.05.2014, p. 1354-1362.

Research output: Contribution to journalArticle

@article{6449294799304007abb3e5947d91456f,
title = "BLESS: Bloom filter-based error correction solution for high-throughput sequencing reads",
abstract = "Motivation: Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers. Results: We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69{\%} of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50{\%} longer with 66{\%} fewer assembly errors.",
author = "Yun Heo and Wu, {Xiao Long} and Deming Chen and Jian Ma and Hwu, {Wen Mei}",
year = "2014",
month = "5",
day = "15",
doi = "10.1093/bioinformatics/btu030",
language = "English (US)",
volume = "30",
pages = "1354--1362",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "10",

}

TY - JOUR

T1 - BLESS

T2 - Bloom filter-based error correction solution for high-throughput sequencing reads

AU - Heo, Yun

AU - Wu, Xiao Long

AU - Chen, Deming

AU - Ma, Jian

AU - Hwu, Wen Mei

PY - 2014/5/15

Y1 - 2014/5/15

N2 - Motivation: Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers. Results: We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors.

AB - Motivation: Rapid advances in next-generation sequencing (NGS) technology have led to exponential increase in the amount of genomic information. However, NGS reads contain far more errors than data from traditional sequencing methods, and downstream genomic analysis results can be improved by correcting the errors. Unfortunately, all the previous error correction methods required a large amount of memory, making it unsuitable to process reads from large genomes with commodity computers. Results: We present a novel algorithm that produces accurate correction results with much less memory compared with previous solutions. The algorithm, named BLoom-filter-based Error correction Solution for high-throughput Sequencing reads (BLESS), uses a single minimum-sized Bloom filter, and is also able to tolerate a higher false-positive rate, thus allowing us to correct errors with a 40× memory usage reduction on average compared with previous methods. Meanwhile, BLESS can extend reads like DNA assemblers to correct errors at the end of reads. Evaluations using real and simulated reads showed that BLESS could generate more accurate results than existing solutions. After errors were corrected using BLESS, 69% of initially unaligned reads could be aligned correctly. Additionally, de novo assembly results became 50% longer with 66% fewer assembly errors.

UR - http://www.scopus.com/inward/record.url?scp=84900802154&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84900802154&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btu030

DO - 10.1093/bioinformatics/btu030

M3 - Article

C2 - 24451628

AN - SCOPUS:84900802154

VL - 30

SP - 1354

EP - 1362

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 10

ER -