The accuracy of fast phylogenetic methods for large datasets.

Luay Nakhleh, Bernard M E Moret, Usman Roshan, Katherine St John, Jerry Sun, Tandy Warnow

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an accurate picture of the evolutionary history of a group of genomes. In particular, sequence-based reconstruction will play an important role, especially in resolving more recent events. But using sequences at the level of whole genomes means working with very large amounts of data--large numbers of sequences--as well as large phylogenetic distances, so that reconstruction methods must be both fast and robust as well as accurate. We study the accuracy, convergence rate, and speed of several fast reconstruction methods: neighbor-joining, Weighbor (a weighted version of neighbor-joining), greedy parsimony, and a new phylogenetic reconstruction method based on disk-covering and parsimony search (DCM-NJ + MP). Our study uses extensive simulations based on random birth-death trees, with controlled deviations from ultrametricity. We find that Weighbor, thanks to its sophisticated handling of probabilities, outperforms other methods for short sequences, while our new method is the best choice for sequence lengths above 100. For very large sequence lengths, all four methods have similar accuracy, so that the speed of neighbor-joining and greedy parsimony makes them the two methods of choice.

Original languageEnglish
Title of host publicationPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
Pages211-222
Number of pages12
StatePublished - Apr 3 2002
Externally publishedYes

Fingerprint

Genome
Datasets
History
Parturition

Cite this

Nakhleh, L., Moret, B. M. E., Roshan, U., St John, K., Sun, J., & Warnow, T. (2002). The accuracy of fast phylogenetic methods for large datasets. In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing (pp. 211-222)

The accuracy of fast phylogenetic methods for large datasets. / Nakhleh, Luay; Moret, Bernard M E; Roshan, Usman; St John, Katherine; Sun, Jerry; Warnow, Tandy.

Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2002. p. 211-222.

Research output: Chapter in Book/Report/Conference proceedingChapter

Nakhleh, L, Moret, BME, Roshan, U, St John, K, Sun, J & Warnow, T 2002, The accuracy of fast phylogenetic methods for large datasets. in Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. pp. 211-222.
Nakhleh L, Moret BME, Roshan U, St John K, Sun J, Warnow T. The accuracy of fast phylogenetic methods for large datasets. In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2002. p. 211-222
Nakhleh, Luay ; Moret, Bernard M E ; Roshan, Usman ; St John, Katherine ; Sun, Jerry ; Warnow, Tandy. / The accuracy of fast phylogenetic methods for large datasets. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. 2002. pp. 211-222
@inbook{dcedff9d6a8b4d0ba969b4ebfd3ac765,
title = "The accuracy of fast phylogenetic methods for large datasets.",
abstract = "Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an accurate picture of the evolutionary history of a group of genomes. In particular, sequence-based reconstruction will play an important role, especially in resolving more recent events. But using sequences at the level of whole genomes means working with very large amounts of data--large numbers of sequences--as well as large phylogenetic distances, so that reconstruction methods must be both fast and robust as well as accurate. We study the accuracy, convergence rate, and speed of several fast reconstruction methods: neighbor-joining, Weighbor (a weighted version of neighbor-joining), greedy parsimony, and a new phylogenetic reconstruction method based on disk-covering and parsimony search (DCM-NJ + MP). Our study uses extensive simulations based on random birth-death trees, with controlled deviations from ultrametricity. We find that Weighbor, thanks to its sophisticated handling of probabilities, outperforms other methods for short sequences, while our new method is the best choice for sequence lengths above 100. For very large sequence lengths, all four methods have similar accuracy, so that the speed of neighbor-joining and greedy parsimony makes them the two methods of choice.",
author = "Luay Nakhleh and Moret, {Bernard M E} and Usman Roshan and {St John}, Katherine and Jerry Sun and Tandy Warnow",
year = "2002",
month = "4",
day = "3",
language = "English",
pages = "211--222",
booktitle = "Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing",

}

TY - CHAP

T1 - The accuracy of fast phylogenetic methods for large datasets.

AU - Nakhleh, Luay

AU - Moret, Bernard M E

AU - Roshan, Usman

AU - St John, Katherine

AU - Sun, Jerry

AU - Warnow, Tandy

PY - 2002/4/3

Y1 - 2002/4/3

N2 - Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an accurate picture of the evolutionary history of a group of genomes. In particular, sequence-based reconstruction will play an important role, especially in resolving more recent events. But using sequences at the level of whole genomes means working with very large amounts of data--large numbers of sequences--as well as large phylogenetic distances, so that reconstruction methods must be both fast and robust as well as accurate. We study the accuracy, convergence rate, and speed of several fast reconstruction methods: neighbor-joining, Weighbor (a weighted version of neighbor-joining), greedy parsimony, and a new phylogenetic reconstruction method based on disk-covering and parsimony search (DCM-NJ + MP). Our study uses extensive simulations based on random birth-death trees, with controlled deviations from ultrametricity. We find that Weighbor, thanks to its sophisticated handling of probabilities, outperforms other methods for short sequences, while our new method is the best choice for sequence lengths above 100. For very large sequence lengths, all four methods have similar accuracy, so that the speed of neighbor-joining and greedy parsimony makes them the two methods of choice.

AB - Whole-genome phylogenetic studies require various sources of phylogenetic signals to produce an accurate picture of the evolutionary history of a group of genomes. In particular, sequence-based reconstruction will play an important role, especially in resolving more recent events. But using sequences at the level of whole genomes means working with very large amounts of data--large numbers of sequences--as well as large phylogenetic distances, so that reconstruction methods must be both fast and robust as well as accurate. We study the accuracy, convergence rate, and speed of several fast reconstruction methods: neighbor-joining, Weighbor (a weighted version of neighbor-joining), greedy parsimony, and a new phylogenetic reconstruction method based on disk-covering and parsimony search (DCM-NJ + MP). Our study uses extensive simulations based on random birth-death trees, with controlled deviations from ultrametricity. We find that Weighbor, thanks to its sophisticated handling of probabilities, outperforms other methods for short sequences, while our new method is the best choice for sequence lengths above 100. For very large sequence lengths, all four methods have similar accuracy, so that the speed of neighbor-joining and greedy parsimony makes them the two methods of choice.

UR - http://www.scopus.com/inward/record.url?scp=0036372890&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036372890&partnerID=8YFLogxK

M3 - Chapter

C2 - 11928477

AN - SCOPUS:0036372890

SP - 211

EP - 222

BT - Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

ER -