### Abstract

Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species and seeks to find an edge-weighted tree T in which the distance d_{ij}^{T} in the tree between the leaves of T corresponding to the species i and j fits the observed distance, d_{ij}. Sometimes the desired tree is ultrametric, so that the tree can be rooted with the root equidistant to each leaf. Many measures for evaluating the `fit' between a distance function d and the path-distance d^{T} have been proposed, and most such measures have resulted in NP-hard optimization problems. In this paper we propose a measure of fit which models the inaccuracy in the data, and present several problems for constructing additive and ultrametric trees using this measure. Many of the resultant optimization problems are NP-hard, and one (finding a minimum size ultrametric tree which increments from an input matrix) is hard to approximate. Specifically, there is a constant c>0 such that unless P = NP, no polynomial time algorithm can exist which finds an approximate solution within the ratio of n^{c}. However, we also present tight upper and lower bounds for the L^{∞}-Minimum Increment to Ultrametric. Thus, we present perhaps the first algorithm for constructing phylogenetic trees from distance matrices which finds optimal trees for a reasonable criterion in polynomial time.

Original language | English (US) |
---|---|

Title of host publication | Proceedings of the 25th Annual ACM Symposium on Theory of Computing, STOC 1993 |

Publisher | Association for Computing Machinery |

Pages | 137-145 |

Number of pages | 9 |

ISBN (Electronic) | 0897915917 |

DOIs | |

State | Published - Jun 1 1993 |

Event | 25th Annual ACM Symposium on Theory of Computing, STOC 1993 - San Diego, United States Duration: May 16 1993 → May 18 1993 |

### Publication series

Name | Proceedings of the Annual ACM Symposium on Theory of Computing |
---|---|

Volume | Part F129585 |

ISSN (Print) | 0737-8017 |

### Other

Other | 25th Annual ACM Symposium on Theory of Computing, STOC 1993 |
---|---|

Country | United States |

City | San Diego |

Period | 5/16/93 → 5/18/93 |

### Fingerprint

### ASJC Scopus subject areas

- Software

### Cite this

*Proceedings of the 25th Annual ACM Symposium on Theory of Computing, STOC 1993*(pp. 137-145). (Proceedings of the Annual ACM Symposium on Theory of Computing; Vol. Part F129585). Association for Computing Machinery. https://doi.org/10.1145/167088.167132

**A robust model for finding optimal evolutionary trees extended abstract.** / Farach, Martin; Kannan, Sampath; Warnow, Tandy.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Proceedings of the 25th Annual ACM Symposium on Theory of Computing, STOC 1993.*Proceedings of the Annual ACM Symposium on Theory of Computing, vol. Part F129585, Association for Computing Machinery, pp. 137-145, 25th Annual ACM Symposium on Theory of Computing, STOC 1993, San Diego, United States, 5/16/93. https://doi.org/10.1145/167088.167132

}

TY - GEN

T1 - A robust model for finding optimal evolutionary trees extended abstract

AU - Farach, Martin

AU - Kannan, Sampath

AU - Warnow, Tandy

PY - 1993/6/1

Y1 - 1993/6/1

N2 - Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species and seeks to find an edge-weighted tree T in which the distance dijT in the tree between the leaves of T corresponding to the species i and j fits the observed distance, dij. Sometimes the desired tree is ultrametric, so that the tree can be rooted with the root equidistant to each leaf. Many measures for evaluating the `fit' between a distance function d and the path-distance dT have been proposed, and most such measures have resulted in NP-hard optimization problems. In this paper we propose a measure of fit which models the inaccuracy in the data, and present several problems for constructing additive and ultrametric trees using this measure. Many of the resultant optimization problems are NP-hard, and one (finding a minimum size ultrametric tree which increments from an input matrix) is hard to approximate. Specifically, there is a constant c>0 such that unless P = NP, no polynomial time algorithm can exist which finds an approximate solution within the ratio of nc. However, we also present tight upper and lower bounds for the L∞-Minimum Increment to Ultrametric. Thus, we present perhaps the first algorithm for constructing phylogenetic trees from distance matrices which finds optimal trees for a reasonable criterion in polynomial time.

AB - Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species and seeks to find an edge-weighted tree T in which the distance dijT in the tree between the leaves of T corresponding to the species i and j fits the observed distance, dij. Sometimes the desired tree is ultrametric, so that the tree can be rooted with the root equidistant to each leaf. Many measures for evaluating the `fit' between a distance function d and the path-distance dT have been proposed, and most such measures have resulted in NP-hard optimization problems. In this paper we propose a measure of fit which models the inaccuracy in the data, and present several problems for constructing additive and ultrametric trees using this measure. Many of the resultant optimization problems are NP-hard, and one (finding a minimum size ultrametric tree which increments from an input matrix) is hard to approximate. Specifically, there is a constant c>0 such that unless P = NP, no polynomial time algorithm can exist which finds an approximate solution within the ratio of nc. However, we also present tight upper and lower bounds for the L∞-Minimum Increment to Ultrametric. Thus, we present perhaps the first algorithm for constructing phylogenetic trees from distance matrices which finds optimal trees for a reasonable criterion in polynomial time.

UR - http://www.scopus.com/inward/record.url?scp=0027307380&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0027307380&partnerID=8YFLogxK

U2 - 10.1145/167088.167132

DO - 10.1145/167088.167132

M3 - Conference contribution

AN - SCOPUS:0027307380

T3 - Proceedings of the Annual ACM Symposium on Theory of Computing

SP - 137

EP - 145

BT - Proceedings of the 25th Annual ACM Symposium on Theory of Computing, STOC 1993

PB - Association for Computing Machinery

ER -