Abstract
Motivation: While phylogenetic analyses of datasets containing 1000-5000 sequences are challenging for existing methods, the estimation of substantially larger phylogenies poses a problem of much greater complexity and scale. Methods: We present DACTAL, a method for phylogeny estimation that produces trees from unaligned sequence datasets without ever needing to estimate an alignment on the entire dataset. DACTAL combines iteration with a novel divide-and-conquer approach, so that each iteration begins with a tree produced in the prior iteration, decomposes the taxon set into overlapping subsets, estimates trees on each subset, and then combines the smaller trees into a tree on the full taxon set using a new supertree method. We prove that DACTAL is guaranteed to produce the true tree under certain conditions. We compare DACTAL to SATé and maximum likelihood trees on estimated alignments using simulated and real datasets with 1000-27 643 taxa. Results: Our studies show that on average DACTAL yields more accurate trees than the two-phase methods we studied on very large datasets that are difficult to align, and has approximately the same accuracy on the easier datasets. The comparison to SATé shows that both have the same accuracy, but that DACTAL achieves this accuracy in a fraction of the time. Furthermore, DACTAL can analyze larger datasets than SATé, including a dataset with almost 28 000 sequences.
Original language | English (US) |
---|---|
Article number | bts218 |
Pages (from-to) | i274-i282 |
Journal | Bioinformatics |
Volume | 28 |
Issue number | 12 |
DOIs | |
State | Published - Jun 2012 |
Externally published | Yes |
ASJC Scopus subject areas
- Statistics and Probability
- Biochemistry
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics
Fingerprint
Dive into the research topics of 'DACTAL: Divide-and-conquer trees (almost) without alignments'. Together they form a unique fingerprint.Datasets
-
Data for SuperFine, DACTAL, and BeeTLe
Swenson, M. S. (Creator), Suri, R. (Creator), Linder, C. R. (Creator), Warnow, T. (Creator), Nguyen, N.-P. (Creator), Mirarab, S. (Creator), Neves, D. T. (Creator), Sobral, J. L. (Creator), Pingali, K. (Creator), Nelesen, S. (Creator), Liu, K. (Creator) & Wang, L.-S. (Creator), University of Illinois Urbana-Champaign, Sep 20 2011
DOI: 10.13012/B2IDB-2952208_V1
Dataset