Abstract
The perfect phylogeny problem is a classical problem in computational evolutionary biology in which a set of species/taxa is described by a set of qualitative characters. In recent years, the problem has been shown to be NP-complete in general, while the different fixed parameter versions can each be solved in polynomial time. In particular, Agarwala and Fernández-Baca have developed an O(23r(nk3 + k4)) algorithm for the perfect phylogeny problem for n species defined by k r-state characters [SIAM J. Comput., 23 (1994), pp. 1216-1224]. Since, commonly, the character data are drawn from alignments of molecular sequences, k is the length of the sequences and can thus be very large (in the hundreds or thousands). Thus, it is imperative to develop algorithms which run efficiently for large values of k. In this paper we make additional observations about the structure of the problem and produce an algorithm for the problem that runs in time O(22rk2n). We also show how it is possible to efficiently build a structure that implicitly represents the set of all perfect phytogenies and to randomly sample from that set.
Original language | English (US) |
---|---|
Pages (from-to) | 1749-1763 |
Number of pages | 15 |
Journal | SIAM Journal on Computing |
Volume | 26 |
Issue number | 6 |
DOIs | |
State | Published - Dec 1997 |
Externally published | Yes |
Keywords
- Combinatorial enumeration
- Dynamic programming
- Evolutionary trees
- Perfect phylogeny
- Polynomial delay algorithms
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Applied Mathematics
- Theoretical Computer Science