Obtaining highly accurate topology estimates of evolutionary trees from very short sequences

Daniel H. Huson, Scott Nettles, Tandy Warnow

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

The evolutionary history of a set of species is represented by a phylogenetic tree, in other words, by a rooted, leaf-labelled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees has long been considered one of the major challenges in systematic biology. None of the polynomial time methods developed by the theoretical computer science community has been shown to outperform the popular Neighbor-Joining method used by systematic biologists, with respect to topology estimation. (However, preliminary experiments indicate that two new variants of Neighbor-Joining, Bio-NJ and Weighbor, do exhibit improved performance.) In this paper, we present a simple polynomial time method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods. We analyze the performance of DCM-boosted distance methods under the general Markov model of evolution, and prove that, by using the DCM-boosted Buneman method, for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. Our experimental study (based upon simulating sequence evolution on model trees, generating about 1000 datasets) confirms these substantial reductions in error rates and extremely fast convergence rates. In particular, we report that DCM-boosted Neighbor-Joining has only 8% of the error of Neighbor-Joining under conditions that are adverse to Neighbor-Joining, and on some trees achieving acceptable error rates (less than 5% error in the topology estimation) from sequences of a few hundred nucleotides, while Neighbor-Joining needs more than 10 K nucleotides to achieve the same level of accuracy.

Original languageEnglish
Title of host publicationProceedings of the Annual International Conference on Computational Molecular Biology, RECOMB
Place of PublicationNew York, NY, United States
PublisherACM
Pages198-207
Number of pages10
StatePublished - 1999
Externally publishedYes
EventProceedings of the 1999 3rd Annual International Conference on Computational Molecular Biology, RECOMB '99 - Lyon
Duration: Apr 11 1999Apr 14 1999

Other

OtherProceedings of the 1999 3rd Annual International Conference on Computational Molecular Biology, RECOMB '99
CityLyon
Period4/11/994/14/99

ASJC Scopus subject areas

  • General Biochemistry, Genetics and Molecular Biology
  • General Computer Science

Fingerprint

Dive into the research topics of 'Obtaining highly accurate topology estimates of evolutionary trees from very short sequences'. Together they form a unique fingerprint.

Cite this