Rec-I-DCM3: A fast algorithmic technique for reconstructing large phylogenetic trees

Usman W. Roshan, Tandy Warnow, Bernard M.E. Moret, Tiffani L. Williams

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP (and presumably ML) is NP-hard, such approaches do not scale when applied to large datasets. In this paper, we present a new technique called Recursive-Iterative-DCM3 (Rec-I-DCM3), which belongs to our family of Disk-Covering Methods (DCMs). We tested this new ' technique on ten large biological datasets ranging from 1,322 to 13,921 sequences and obtained dramatic speedups as well as significant improvements in accuracy (better than 99.99%) in comparison to existing approaches. Thus, high-quality reconstructions can be obtained for datasets at least ten times larger than was previously possible.

Original languageEnglish (US)
Title of host publicationProceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004
PublisherIEEE Computer Society
Pages98-109
Number of pages12
ISBN (Print)0769521940, 9780769521947
StatePublished - Jan 1 2004
Externally publishedYes
EventProceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004 - Stanford, CA, United States
Duration: Aug 16 2004Aug 19 2004

Publication series

NameProceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004

Other

OtherProceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004
CountryUnited States
CityStanford, CA
Period8/16/048/19/04

Fingerprint

Maximum likelihood

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Roshan, U. W., Warnow, T., Moret, B. M. E., & Williams, T. L. (2004). Rec-I-DCM3: A fast algorithmic technique for reconstructing large phylogenetic trees. In Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004 (pp. 98-109). (Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004). IEEE Computer Society.

Rec-I-DCM3 : A fast algorithmic technique for reconstructing large phylogenetic trees. / Roshan, Usman W.; Warnow, Tandy; Moret, Bernard M.E.; Williams, Tiffani L.

Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004. IEEE Computer Society, 2004. p. 98-109 (Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Roshan, UW, Warnow, T, Moret, BME & Williams, TL 2004, Rec-I-DCM3: A fast algorithmic technique for reconstructing large phylogenetic trees. in Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004. Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004, IEEE Computer Society, pp. 98-109, Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004, Stanford, CA, United States, 8/16/04.
Roshan UW, Warnow T, Moret BME, Williams TL. Rec-I-DCM3: A fast algorithmic technique for reconstructing large phylogenetic trees. In Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004. IEEE Computer Society. 2004. p. 98-109. (Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004).
Roshan, Usman W. ; Warnow, Tandy ; Moret, Bernard M.E. ; Williams, Tiffani L. / Rec-I-DCM3 : A fast algorithmic technique for reconstructing large phylogenetic trees. Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004. IEEE Computer Society, 2004. pp. 98-109 (Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004).
@inproceedings{13db25d039c84fb2b4c1ff84ca2b0f71,
title = "Rec-I-DCM3: A fast algorithmic technique for reconstructing large phylogenetic trees",
abstract = "Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP (and presumably ML) is NP-hard, such approaches do not scale when applied to large datasets. In this paper, we present a new technique called Recursive-Iterative-DCM3 (Rec-I-DCM3), which belongs to our family of Disk-Covering Methods (DCMs). We tested this new ' technique on ten large biological datasets ranging from 1,322 to 13,921 sequences and obtained dramatic speedups as well as significant improvements in accuracy (better than 99.99{\%}) in comparison to existing approaches. Thus, high-quality reconstructions can be obtained for datasets at least ten times larger than was previously possible.",
author = "Roshan, {Usman W.} and Tandy Warnow and Moret, {Bernard M.E.} and Williams, {Tiffani L.}",
year = "2004",
month = "1",
day = "1",
language = "English (US)",
isbn = "0769521940",
series = "Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004",
publisher = "IEEE Computer Society",
pages = "98--109",
booktitle = "Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004",

}

TY - GEN

T1 - Rec-I-DCM3

T2 - A fast algorithmic technique for reconstructing large phylogenetic trees

AU - Roshan, Usman W.

AU - Warnow, Tandy

AU - Moret, Bernard M.E.

AU - Williams, Tiffani L.

PY - 2004/1/1

Y1 - 2004/1/1

N2 - Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP (and presumably ML) is NP-hard, such approaches do not scale when applied to large datasets. In this paper, we present a new technique called Recursive-Iterative-DCM3 (Rec-I-DCM3), which belongs to our family of Disk-Covering Methods (DCMs). We tested this new ' technique on ten large biological datasets ranging from 1,322 to 13,921 sequences and obtained dramatic speedups as well as significant improvements in accuracy (better than 99.99%) in comparison to existing approaches. Thus, high-quality reconstructions can be obtained for datasets at least ten times larger than was previously possible.

AB - Phylogenetic trees are commonly reconstructed based on hard optimization problems such as maximum parsimony (MP) and maximum likelihood (ML). Conventional MP heuristics for producing phylogenetic trees produce good solutions within reasonable time on small datasets (up to a few thousand sequences), while ML heuristics are limited to smaller datasets (up to a few hundred sequences). However, since MP (and presumably ML) is NP-hard, such approaches do not scale when applied to large datasets. In this paper, we present a new technique called Recursive-Iterative-DCM3 (Rec-I-DCM3), which belongs to our family of Disk-Covering Methods (DCMs). We tested this new ' technique on ten large biological datasets ranging from 1,322 to 13,921 sequences and obtained dramatic speedups as well as significant improvements in accuracy (better than 99.99%) in comparison to existing approaches. Thus, high-quality reconstructions can be obtained for datasets at least ten times larger than was previously possible.

UR - http://www.scopus.com/inward/record.url?scp=14044251530&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=14044251530&partnerID=8YFLogxK

M3 - Conference contribution

C2 - 16448004

AN - SCOPUS:14044251530

SN - 0769521940

SN - 9780769521947

T3 - Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004

SP - 98

EP - 109

BT - Proceedings - 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004

PB - IEEE Computer Society

ER -