A robust model for finding optimal evolutionary trees

M. Farach, S. Kannan, T. Warnow

Research output: Contribution to journalArticle

Abstract

Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species, and seeks to find an edge-weighted tree T in which the distance d ij T in the tree between the leaves of T corresponding to the species i and j exactly equals the observed distance, d ij . When such a tree exists, this is expressed in the biological literature by saying that the distance function or matrix is additive, and trees can be constructed from additive distance matrices in 0(n 2) time. Real distance data is hardly ever additive, and we therefore need ways of modeling the problem of finding the best-fit tree as an optimization problem. In this paper we present several natural and realistic ways of modeling the inaccuracies in the distance data. In one model we assume that we have upper and lower bounds for the distances between pairs of species and try to find an additive distance matrix between these bounds. In a second model we are given a partial matrix and asked to find if we can fill in the unspecified entries in order to make the entire matrix additive. For both of these models we also consider a more restrictive problem of finding a matrix that fits a tree which is not only additive but also ultrametric. Ultrametric matrices correspond to trees which can be rooted so that the distance from the root to any leaf is the same. Ultrametric matrices are desirable in biology since the edge weights then indicate evolutionary time. We give polynomial-time algorithms for some of the problems while showing others to be NP-complete. We also consider various ways of "fitting" a given distance matrix (or a pair of upper- and lower-bound matrices) to a tree in order to minimize various criteria of error in the fit. For most criteria this optimization problem turns out to be NP-hard, while we do get polynomial-time algorithms for some.

Original languageEnglish (US)
Pages (from-to)155-179
Number of pages25
JournalAlgorithmica
Volume13
Issue number1-2
DOIs
StatePublished - Feb 1 1995
Externally publishedYes

Fingerprint

Evolutionary Tree
Distance Matrix
Model
Polynomial-time Algorithm
Upper and Lower Bounds
Leaves
NP-complete problem
Partial Matrix
Optimization Problem
Polynomials
Computational Biology
Distance Function
Modeling
Biology
Standard Model
Roots
Entire
Minimise

Keywords

  • Additive trees
  • Clustering
  • Distance-based methods
  • Efficient algorithms
  • Evolutionary tree reconstruction
  • NP-completeness
  • Ultrametric trees

ASJC Scopus subject areas

  • Applied Mathematics
  • Safety, Risk, Reliability and Quality
  • Software
  • Computer Graphics and Computer-Aided Design

Cite this

A robust model for finding optimal evolutionary trees. / Farach, M.; Kannan, S.; Warnow, T.

In: Algorithmica, Vol. 13, No. 1-2, 01.02.1995, p. 155-179.

Research output: Contribution to journalArticle

Farach, M. ; Kannan, S. ; Warnow, T. / A robust model for finding optimal evolutionary trees. In: Algorithmica. 1995 ; Vol. 13, No. 1-2. pp. 155-179.
@article{1b7e2e0a8b73409fbf91fe50b1e180ff,
title = "A robust model for finding optimal evolutionary trees",
abstract = "Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species, and seeks to find an edge-weighted tree T in which the distance d ij T in the tree between the leaves of T corresponding to the species i and j exactly equals the observed distance, d ij . When such a tree exists, this is expressed in the biological literature by saying that the distance function or matrix is additive, and trees can be constructed from additive distance matrices in 0(n 2) time. Real distance data is hardly ever additive, and we therefore need ways of modeling the problem of finding the best-fit tree as an optimization problem. In this paper we present several natural and realistic ways of modeling the inaccuracies in the distance data. In one model we assume that we have upper and lower bounds for the distances between pairs of species and try to find an additive distance matrix between these bounds. In a second model we are given a partial matrix and asked to find if we can fill in the unspecified entries in order to make the entire matrix additive. For both of these models we also consider a more restrictive problem of finding a matrix that fits a tree which is not only additive but also ultrametric. Ultrametric matrices correspond to trees which can be rooted so that the distance from the root to any leaf is the same. Ultrametric matrices are desirable in biology since the edge weights then indicate evolutionary time. We give polynomial-time algorithms for some of the problems while showing others to be NP-complete. We also consider various ways of {"}fitting{"} a given distance matrix (or a pair of upper- and lower-bound matrices) to a tree in order to minimize various criteria of error in the fit. For most criteria this optimization problem turns out to be NP-hard, while we do get polynomial-time algorithms for some.",
keywords = "Additive trees, Clustering, Distance-based methods, Efficient algorithms, Evolutionary tree reconstruction, NP-completeness, Ultrametric trees",
author = "M. Farach and S. Kannan and T. Warnow",
year = "1995",
month = "2",
day = "1",
doi = "10.1007/BF01188585",
language = "English (US)",
volume = "13",
pages = "155--179",
journal = "Algorithmica (New York)",
issn = "0178-4617",
publisher = "Springer New York",
number = "1-2",

}

TY - JOUR

T1 - A robust model for finding optimal evolutionary trees

AU - Farach, M.

AU - Kannan, S.

AU - Warnow, T.

PY - 1995/2/1

Y1 - 1995/2/1

N2 - Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species, and seeks to find an edge-weighted tree T in which the distance d ij T in the tree between the leaves of T corresponding to the species i and j exactly equals the observed distance, d ij . When such a tree exists, this is expressed in the biological literature by saying that the distance function or matrix is additive, and trees can be constructed from additive distance matrices in 0(n 2) time. Real distance data is hardly ever additive, and we therefore need ways of modeling the problem of finding the best-fit tree as an optimization problem. In this paper we present several natural and realistic ways of modeling the inaccuracies in the distance data. In one model we assume that we have upper and lower bounds for the distances between pairs of species and try to find an additive distance matrix between these bounds. In a second model we are given a partial matrix and asked to find if we can fill in the unspecified entries in order to make the entire matrix additive. For both of these models we also consider a more restrictive problem of finding a matrix that fits a tree which is not only additive but also ultrametric. Ultrametric matrices correspond to trees which can be rooted so that the distance from the root to any leaf is the same. Ultrametric matrices are desirable in biology since the edge weights then indicate evolutionary time. We give polynomial-time algorithms for some of the problems while showing others to be NP-complete. We also consider various ways of "fitting" a given distance matrix (or a pair of upper- and lower-bound matrices) to a tree in order to minimize various criteria of error in the fit. For most criteria this optimization problem turns out to be NP-hard, while we do get polynomial-time algorithms for some.

AB - Constructing evolutionary trees for species sets is a fundamental problem in computational biology. One of the standard models assumes the ability to compute distances between every pair of species, and seeks to find an edge-weighted tree T in which the distance d ij T in the tree between the leaves of T corresponding to the species i and j exactly equals the observed distance, d ij . When such a tree exists, this is expressed in the biological literature by saying that the distance function or matrix is additive, and trees can be constructed from additive distance matrices in 0(n 2) time. Real distance data is hardly ever additive, and we therefore need ways of modeling the problem of finding the best-fit tree as an optimization problem. In this paper we present several natural and realistic ways of modeling the inaccuracies in the distance data. In one model we assume that we have upper and lower bounds for the distances between pairs of species and try to find an additive distance matrix between these bounds. In a second model we are given a partial matrix and asked to find if we can fill in the unspecified entries in order to make the entire matrix additive. For both of these models we also consider a more restrictive problem of finding a matrix that fits a tree which is not only additive but also ultrametric. Ultrametric matrices correspond to trees which can be rooted so that the distance from the root to any leaf is the same. Ultrametric matrices are desirable in biology since the edge weights then indicate evolutionary time. We give polynomial-time algorithms for some of the problems while showing others to be NP-complete. We also consider various ways of "fitting" a given distance matrix (or a pair of upper- and lower-bound matrices) to a tree in order to minimize various criteria of error in the fit. For most criteria this optimization problem turns out to be NP-hard, while we do get polynomial-time algorithms for some.

KW - Additive trees

KW - Clustering

KW - Distance-based methods

KW - Efficient algorithms

KW - Evolutionary tree reconstruction

KW - NP-completeness

KW - Ultrametric trees

UR - http://www.scopus.com/inward/record.url?scp=0029192324&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0029192324&partnerID=8YFLogxK

U2 - 10.1007/BF01188585

DO - 10.1007/BF01188585

M3 - Article

AN - SCOPUS:0029192324

VL - 13

SP - 155

EP - 179

JO - Algorithmica (New York)

JF - Algorithmica (New York)

SN - 0178-4617

IS - 1-2

ER -