A fast algorithm for the computation and enumeration of perfect phylogenies

Sampath Kannan, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

The perfect phylogeny problem is a classical problem in computational evolutionary biology in which a set of species/taxa is described by a set of qualitative characters. In recent years, the problem has been shown to be NP-complete in general, while the different fixed parameter versions can each be solved in polynomial time. In particular, Agarwala and Fernández-Baca have developed an O(23r(nk3 + k4)) algorithm for the perfect phylogeny problem for n species defined by k r-state characters [SIAM J. Comput., 23 (1994), pp. 1216-1224]. Since, commonly, the character data are drawn from alignments of molecular sequences, k is the length of the sequences and can thus be very large (in the hundreds or thousands). Thus, it is imperative to develop algorithms which run efficiently for large values of k. In this paper we make additional observations about the structure of the problem and produce an algorithm for the problem that runs in time O(22rk2n). We also show how it is possible to efficiently build a structure that implicitly represents the set of all perfect phytogenies and to randomly sample from that set.

Original languageEnglish (US)
Pages (from-to)1749-1763
Number of pages15
JournalSIAM Journal on Computing
Volume26
Issue number6
DOIs
StatePublished - Dec 1997
Externally publishedYes

Fingerprint

Phylogeny
Enumeration
Fast Algorithm
Polynomials
Biology
Polynomial time
Alignment
NP-complete problem
Character

Keywords

  • Combinatorial enumeration
  • Dynamic programming
  • Evolutionary trees
  • Perfect phylogeny
  • Polynomial delay algorithms

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Applied Mathematics
  • Theoretical Computer Science

Cite this

A fast algorithm for the computation and enumeration of perfect phylogenies. / Kannan, Sampath; Warnow, Tandy.

In: SIAM Journal on Computing, Vol. 26, No. 6, 12.1997, p. 1749-1763.

Research output: Contribution to journalArticle

@article{5e22b255e91941f3b4fd5f1a17717cc9,
title = "A fast algorithm for the computation and enumeration of perfect phylogenies",
abstract = "The perfect phylogeny problem is a classical problem in computational evolutionary biology in which a set of species/taxa is described by a set of qualitative characters. In recent years, the problem has been shown to be NP-complete in general, while the different fixed parameter versions can each be solved in polynomial time. In particular, Agarwala and Fern{\'a}ndez-Baca have developed an O(23r(nk3 + k4)) algorithm for the perfect phylogeny problem for n species defined by k r-state characters [SIAM J. Comput., 23 (1994), pp. 1216-1224]. Since, commonly, the character data are drawn from alignments of molecular sequences, k is the length of the sequences and can thus be very large (in the hundreds or thousands). Thus, it is imperative to develop algorithms which run efficiently for large values of k. In this paper we make additional observations about the structure of the problem and produce an algorithm for the problem that runs in time O(22rk2n). We also show how it is possible to efficiently build a structure that implicitly represents the set of all perfect phytogenies and to randomly sample from that set.",
keywords = "Combinatorial enumeration, Dynamic programming, Evolutionary trees, Perfect phylogeny, Polynomial delay algorithms",
author = "Sampath Kannan and Tandy Warnow",
year = "1997",
month = "12",
doi = "10.1137/S0097539794279067",
language = "English (US)",
volume = "26",
pages = "1749--1763",
journal = "SIAM Journal on Computing",
issn = "0097-5397",
publisher = "Society for Industrial and Applied Mathematics Publications",
number = "6",

}

TY - JOUR

T1 - A fast algorithm for the computation and enumeration of perfect phylogenies

AU - Kannan, Sampath

AU - Warnow, Tandy

PY - 1997/12

Y1 - 1997/12

N2 - The perfect phylogeny problem is a classical problem in computational evolutionary biology in which a set of species/taxa is described by a set of qualitative characters. In recent years, the problem has been shown to be NP-complete in general, while the different fixed parameter versions can each be solved in polynomial time. In particular, Agarwala and Fernández-Baca have developed an O(23r(nk3 + k4)) algorithm for the perfect phylogeny problem for n species defined by k r-state characters [SIAM J. Comput., 23 (1994), pp. 1216-1224]. Since, commonly, the character data are drawn from alignments of molecular sequences, k is the length of the sequences and can thus be very large (in the hundreds or thousands). Thus, it is imperative to develop algorithms which run efficiently for large values of k. In this paper we make additional observations about the structure of the problem and produce an algorithm for the problem that runs in time O(22rk2n). We also show how it is possible to efficiently build a structure that implicitly represents the set of all perfect phytogenies and to randomly sample from that set.

AB - The perfect phylogeny problem is a classical problem in computational evolutionary biology in which a set of species/taxa is described by a set of qualitative characters. In recent years, the problem has been shown to be NP-complete in general, while the different fixed parameter versions can each be solved in polynomial time. In particular, Agarwala and Fernández-Baca have developed an O(23r(nk3 + k4)) algorithm for the perfect phylogeny problem for n species defined by k r-state characters [SIAM J. Comput., 23 (1994), pp. 1216-1224]. Since, commonly, the character data are drawn from alignments of molecular sequences, k is the length of the sequences and can thus be very large (in the hundreds or thousands). Thus, it is imperative to develop algorithms which run efficiently for large values of k. In this paper we make additional observations about the structure of the problem and produce an algorithm for the problem that runs in time O(22rk2n). We also show how it is possible to efficiently build a structure that implicitly represents the set of all perfect phytogenies and to randomly sample from that set.

KW - Combinatorial enumeration

KW - Dynamic programming

KW - Evolutionary trees

KW - Perfect phylogeny

KW - Polynomial delay algorithms

UR - http://www.scopus.com/inward/record.url?scp=0008827068&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0008827068&partnerID=8YFLogxK

U2 - 10.1137/S0097539794279067

DO - 10.1137/S0097539794279067

M3 - Article

AN - SCOPUS:0008827068

VL - 26

SP - 1749

EP - 1763

JO - SIAM Journal on Computing

JF - SIAM Journal on Computing

SN - 0097-5397

IS - 6

ER -