PSAR: Measuring multiple sequence alignment reliability by probabilistic sampling

Jaebum Kim, Jian Ma

Research output: Contribution to journalArticlepeer-review

Abstract

Multiple sequence alignment, which is of fundamental importance for comparative genomics, is a difficult problem and error-prone. Therefore, it is essential to measure the reliability of the alignments and incorporate it into downstream analyses. We propose a new probabilistic sampling-based alignment reliability (PSAR) score. Instead of relying on heuristic assumptions, such as the correlation between alignment quality and guide tree uncertainty in progressive alignment methods, we directly generate suboptimal alignments from an input multiple sequence alignment by a probabilistic sampling method, and compute the agreement of the input alignment with the suboptimal alignments as the alignment reliability score. We construct the suboptimal alignments by an approximate method that is based on pairwise comparisons between each single sequence and the sub-alignment of the input alignment where the chosen sequence is left out. By using simulation-based benchmarks, we find that our approach is superior to existing ones, supporting that the suboptimal alignments are highly informative source for assessing alignment reliability. We apply the PSAR method to the alignments in the UCSC Genome Browser to measure the reliability of alignments in different types of regions, such as coding exons and conserved non-coding regions, and use it to guide cross-species conservation study.

Original languageEnglish (US)
Pages (from-to)6359-6368
Number of pages10
JournalNucleic acids research
Volume39
Issue number15
DOIs
StatePublished - Aug 2011
Externally publishedYes

ASJC Scopus subject areas

  • Genetics

Fingerprint

Dive into the research topics of 'PSAR: Measuring multiple sequence alignment reliability by probabilistic sampling'. Together they form a unique fingerprint.

Cite this