Disk-covering, a fast-converging method for phylogenetic tree reconstruction

Daniel H. Huson, Scott M. Nettles, Tandy J. Warnow

Research output: Contribution to journalArticle

Abstract

The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes- Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths.

Original languageEnglish (US)
Pages (from-to)369-386
Number of pages18
JournalJournal of Computational Biology
Volume6
Issue number3-4
DOIs
StatePublished - Sep 1 1999
Externally publishedYes

Fingerprint

Phylogenetic Tree
Covering
Markov Model
Leaves
Topology
Polynomials
Labeled Trees
Cantor
Phylogenetics
Inaccurate
History
Biology
Error Rate
Experimental Study
Internal
Polynomial
Vertex of a graph

Keywords

  • Algorithms
  • Biomolecular data
  • Chordal graphs
  • Clustering
  • Evolution
  • Jukes Cantor
  • Phylogenetic trees

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics

Cite this

Disk-covering, a fast-converging method for phylogenetic tree reconstruction. / Huson, Daniel H.; Nettles, Scott M.; Warnow, Tandy J.

In: Journal of Computational Biology, Vol. 6, No. 3-4, 01.09.1999, p. 369-386.

Research output: Contribution to journalArticle

@article{14853aac0fba4262a3a1158937cf057f,
title = "Disk-covering, a fast-converging method for phylogenetic tree reconstruction",
abstract = "The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes- Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths.",
keywords = "Algorithms, Biomolecular data, Chordal graphs, Clustering, Evolution, Jukes Cantor, Phylogenetic trees",
author = "Huson, {Daniel H.} and Nettles, {Scott M.} and Warnow, {Tandy J.}",
year = "1999",
month = "9",
day = "1",
doi = "10.1089/106652799318337",
language = "English (US)",
volume = "6",
pages = "369--386",
journal = "Journal of Computational Biology",
issn = "1066-5277",
publisher = "Mary Ann Liebert Inc.",
number = "3-4",

}

TY - JOUR

T1 - Disk-covering, a fast-converging method for phylogenetic tree reconstruction

AU - Huson, Daniel H.

AU - Nettles, Scott M.

AU - Warnow, Tandy J.

PY - 1999/9/1

Y1 - 1999/9/1

N2 - The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes- Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths.

AB - The evolutionary history of a set of species is represented by a phylogenetic tree, which is a rooted, leaf-labeled tree, where internal nodes represent ancestral species and the leaves represent modern day species. Accurate (or even boundedly inaccurate) topology reconstructions of large and divergent trees from realistic length sequences have long been considered one of the major challenges in systematic biology. In this paper, we present a simple method, the Disk-Covering Method (DCM), which boosts the performance of base phylogenetic methods under various Markov models of evolution. We analyze the performance of DCM-boosted distance methods under the Jukes- Cantor Markov model of biomolecular sequence evolution, and prove that for almost all trees, polylogarithmic length sequences suffice for complete accuracy with high probability, while polynomial length sequences always suffice. We also provide an experimental study based upon simulating sequence evolution on model trees. This study confirms substantial reductions in error rates at realistic sequence lengths.

KW - Algorithms

KW - Biomolecular data

KW - Chordal graphs

KW - Clustering

KW - Evolution

KW - Jukes Cantor

KW - Phylogenetic trees

UR - http://www.scopus.com/inward/record.url?scp=0032746022&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032746022&partnerID=8YFLogxK

U2 - 10.1089/106652799318337

DO - 10.1089/106652799318337

M3 - Article

C2 - 10582573

AN - SCOPUS:0032746022

VL - 6

SP - 369

EP - 386

JO - Journal of Computational Biology

JF - Journal of Computational Biology

SN - 1066-5277

IS - 3-4

ER -