Barking up the wrong treelength: The impact of gap penalty on alignment and tree accuracy

Kevin Liu, Serita Nelesen, Sindhu Raghavan, C. Randal Linder, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

Several methods have been developed for simultaneous estimation of alignment and tree, of which POY is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to estimating the alignment using ClustalW and then analyzing the resultant alignment using maximum parsimony. They found that ClustalW+MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques are not competitive with two-phase techniques. Our paper presents a simulation study in which we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Our study considers the impact of the gap penalty and suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs. Our study suggests that optimizing under an affine gap penalty might produce alignments that are better than ClustalW alignments, and competitive with those produced by the best current alignment methods. We also show that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW+MP, and competitive with the current best two-phase methods.

Original languageEnglish (US)
Article number4547425
Pages (from-to)7-21
Number of pages15
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume6
Issue number1
DOIs
StatePublished - Jan 1 2009
Externally publishedYes

Fingerprint

Penalty
Alignment
Simultaneous Estimation
Simulation Study
Maximum Parsimony
Phylogenetic Tree
NP-hard Problems
Biology
Optimization Problem

Keywords

  • Biology and genetics
  • Markov processes

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics
  • Medicine(all)

Cite this

Barking up the wrong treelength : The impact of gap penalty on alignment and tree accuracy. / Liu, Kevin; Nelesen, Serita; Raghavan, Sindhu; Linder, C. Randal; Warnow, Tandy.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 6, No. 1, 4547425, 01.01.2009, p. 7-21.

Research output: Contribution to journalArticle

@article{ae2a0ee6d8c446a190a726f1a44485ba,
title = "Barking up the wrong treelength: The impact of gap penalty on alignment and tree accuracy",
abstract = "Several methods have been developed for simultaneous estimation of alignment and tree, of which POY is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to estimating the alignment using ClustalW and then analyzing the resultant alignment using maximum parsimony. They found that ClustalW+MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques are not competitive with two-phase techniques. Our paper presents a simulation study in which we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Our study considers the impact of the gap penalty and suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs. Our study suggests that optimizing under an affine gap penalty might produce alignments that are better than ClustalW alignments, and competitive with those produced by the best current alignment methods. We also show that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW+MP, and competitive with the current best two-phase methods.",
keywords = "Biology and genetics, Markov processes",
author = "Kevin Liu and Serita Nelesen and Sindhu Raghavan and Linder, {C. Randal} and Tandy Warnow",
year = "2009",
month = "1",
day = "1",
doi = "10.1109/TCBB.2008.63",
language = "English (US)",
volume = "6",
pages = "7--21",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "1",

}

TY - JOUR

T1 - Barking up the wrong treelength

T2 - The impact of gap penalty on alignment and tree accuracy

AU - Liu, Kevin

AU - Nelesen, Serita

AU - Raghavan, Sindhu

AU - Linder, C. Randal

AU - Warnow, Tandy

PY - 2009/1/1

Y1 - 2009/1/1

N2 - Several methods have been developed for simultaneous estimation of alignment and tree, of which POY is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to estimating the alignment using ClustalW and then analyzing the resultant alignment using maximum parsimony. They found that ClustalW+MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques are not competitive with two-phase techniques. Our paper presents a simulation study in which we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Our study considers the impact of the gap penalty and suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs. Our study suggests that optimizing under an affine gap penalty might produce alignments that are better than ClustalW alignments, and competitive with those produced by the best current alignment methods. We also show that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW+MP, and competitive with the current best two-phase methods.

AB - Several methods have been developed for simultaneous estimation of alignment and tree, of which POY is the most popular. In a 2007 paper published in Systematic Biology, Ogden and Rosenberg reported on a simulation study in which they compared POY to estimating the alignment using ClustalW and then analyzing the resultant alignment using maximum parsimony. They found that ClustalW+MP outperformed POY with respect to alignment and phylogenetic tree accuracy, and they concluded that simultaneous estimation techniques are not competitive with two-phase techniques. Our paper presents a simulation study in which we focus on the NP-hard optimization problem that POY addresses: minimizing treelength. Our study considers the impact of the gap penalty and suggests that the poor performance observed for POY by Ogden and Rosenberg is due to the simple gap penalties they used to score alignment/tree pairs. Our study suggests that optimizing under an affine gap penalty might produce alignments that are better than ClustalW alignments, and competitive with those produced by the best current alignment methods. We also show that optimizing under this affine gap penalty produces trees whose topological accuracy is better than ClustalW+MP, and competitive with the current best two-phase methods.

KW - Biology and genetics

KW - Markov processes

UR - http://www.scopus.com/inward/record.url?scp=59649130312&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=59649130312&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2008.63

DO - 10.1109/TCBB.2008.63

M3 - Article

C2 - 19179695

AN - SCOPUS:59649130312

VL - 6

SP - 7

EP - 21

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 1

M1 - 4547425

ER -