The impact of multiple protein sequence alignment on phylogenetic estimation

Li San Wang, Jim Leebens-Mack, P. Kerr Wall, Kevin Beckmann, Claude W. Depamphilis, Tandy Warnow

Research output: Contribution to journalArticle

Abstract

Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.

Original languageEnglish (US)
Article number5235137
Pages (from-to)1108-1119
Number of pages12
JournalIEEE/ACM Transactions on Computational Biology and Bioinformatics
Volume8
Issue number4
DOIs
StatePublished - Mar 22 2011

Fingerprint

Sequence Alignment
Phylogenetics
Protein Sequence
Alignment
Proteins
Multiple Sequence Alignment
Phylogenetic Tree
Protein Structure
Error Rate
Simulation Study
Evaluate
Range of data

Keywords

  • biology and genetics
  • multiple protein sequence alignment
  • phylogeny reconstruction.
  • Simulation

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Applied Mathematics
  • Medicine(all)

Cite this

The impact of multiple protein sequence alignment on phylogenetic estimation. / Wang, Li San; Leebens-Mack, Jim; Wall, P. Kerr; Beckmann, Kevin; Depamphilis, Claude W.; Warnow, Tandy.

In: IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 8, No. 4, 5235137, 22.03.2011, p. 1108-1119.

Research output: Contribution to journalArticle

Wang, Li San ; Leebens-Mack, Jim ; Wall, P. Kerr ; Beckmann, Kevin ; Depamphilis, Claude W. ; Warnow, Tandy. / The impact of multiple protein sequence alignment on phylogenetic estimation. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2011 ; Vol. 8, No. 4. pp. 1108-1119.
@article{15023b66b1d64c94b35b2b07538c3ec4,
title = "The impact of multiple protein sequence alignment on phylogenetic estimation",
abstract = "Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.",
keywords = "biology and genetics, multiple protein sequence alignment, phylogeny reconstruction., Simulation",
author = "Wang, {Li San} and Jim Leebens-Mack and Wall, {P. Kerr} and Kevin Beckmann and Depamphilis, {Claude W.} and Tandy Warnow",
year = "2011",
month = "3",
day = "22",
doi = "10.1109/TCBB.2009.68",
language = "English (US)",
volume = "8",
pages = "1108--1119",
journal = "IEEE/ACM Transactions on Computational Biology and Bioinformatics",
issn = "1545-5963",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
number = "4",

}

TY - JOUR

T1 - The impact of multiple protein sequence alignment on phylogenetic estimation

AU - Wang, Li San

AU - Leebens-Mack, Jim

AU - Wall, P. Kerr

AU - Beckmann, Kevin

AU - Depamphilis, Claude W.

AU - Warnow, Tandy

PY - 2011/3/22

Y1 - 2011/3/22

N2 - Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.

AB - Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.

KW - biology and genetics

KW - multiple protein sequence alignment

KW - phylogeny reconstruction.

KW - Simulation

UR - http://www.scopus.com/inward/record.url?scp=79957605776&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79957605776&partnerID=8YFLogxK

U2 - 10.1109/TCBB.2009.68

DO - 10.1109/TCBB.2009.68

M3 - Article

C2 - 21566256

AN - SCOPUS:79957605776

VL - 8

SP - 1108

EP - 1119

JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics

JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics

SN - 1545-5963

IS - 4

M1 - 5235137

ER -