TY - JOUR
T1 - The impact of multiple protein sequence alignment on phylogenetic estimation
AU - Wang, Li San
AU - Leebens-Mack, Jim
AU - Wall, P. Kerr
AU - Beckmann, Kevin
AU - Depamphilis, Claude W.
AU - Warnow, Tandy
N1 - Funding Information:
This research was supported in part by the US National Science Foundation (NSF) grants DBI 0638595 and DBI 0115684 to Claude W. dePamphilis and Jim Leebens-Mack, DEB 0733029 to Tandy Warnow and Jim Leebens-Mack, and ITR 0331453, ITR/AP 0121680, and DEB 0120709 to Tandy Warnow. Tandy Warnow was also supported by the Program for Evolutionary Dynamics at Harvard, by the Radcliffe Institute for Fundamental Research, and by the University of Texas Faculty Research program.
Funding Information:
Tandy Warnow received the PhD degree in mathematics at UC Berkeley in 1991 under the direction of Gene Lawler, and did post-doctoral training with Simon Tavare and Michael Waterman at USC. She is professor of computer science at the University of Texas at Austin. Her research combines mathe-matics, computer science, and statistics to develop improved models and algorithms for reconstructing complex and large-scale evolutionary histories in both biology and historical linguistics. She received the US National Science Foundation (NSF) Young Investigator Award in 1994, and the David and Lucile Packard Foundation Award in science and engineering in 1996.
PY - 2011
Y1 - 2011
N2 - Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.
AB - Multiple sequence alignment is typically the first step in estimating phylogenetic trees, with the assumption being that as alignments improve, so will phylogenetic reconstructions. Over the last decade or so, new multiple sequence alignment methods have been developed to improve comparative analyses of protein structure, but these new methods have not been typically used in phylogenetic analyses. In this paper, we report on a simulation study that we performed to evaluate the consequences of using these new multiple sequence alignment methods in terms of the resultant phylogenetic reconstruction. We find that while alignment accuracy is positively correlated with phylogenetic accuracy, the amount of improvement in phylogenetic estimation that results from an improved alignment can range from quite small to substantial. We observe that phylogenetic accuracy is most highly correlated with alignment accuracy when sequences are most difficult to align, and that variation in alignment accuracy can have little impact on phylogenetic accuracy when alignment error rates are generally low. We discuss these observations and implications for future work.
KW - biology and genetics
KW - multiple protein sequence alignment
KW - phylogeny reconstruction.
KW - Simulation
UR - http://www.scopus.com/inward/record.url?scp=79957605776&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79957605776&partnerID=8YFLogxK
U2 - 10.1109/TCBB.2009.68
DO - 10.1109/TCBB.2009.68
M3 - Article
C2 - 21566256
AN - SCOPUS:79957605776
SN - 1545-5963
VL - 8
SP - 1108
EP - 1119
JO - IEEE/ACM Transactions on Computational Biology and Bioinformatics
JF - IEEE/ACM Transactions on Computational Biology and Bioinformatics
IS - 4
M1 - 5235137
ER -