Indelign: A probabilistic framework for annotation of insertions and deletions in a multiple alignment

Jaebum Kim, Saurabh Sinha

Research output: Contribution to journalArticlepeer-review

Abstract

Motivation: A quantitative study of molecular evolutionary events such as substitutions, insertions and deletions from closely related genomes requires (1) an accurate multiple sequence alignment program and (2) a method to annotate the insertions and deletions that explain the 'gaps' in the alignment. Although the former requirement has been extensively addressed, the latter problem has received little attention, especially in a comprehensive probabilistic framework. Results: Here, we present Indelign, a program that uses a probabilistic evolutionary model to compute the most likely scenario of insertions and deletions consistent with an input multiple alignment. It is also capable of modifying the given alignment so as to obtain a better agreement with the evolutionary model. We find close to optimal performance and substantial improvement over alternative methods, in tests of Indelign on synthetic data. We use Indelign to analyze regulatory sequences in Drosophila, and find an excess of insertions over deletions, which is different from what has been reported for neutral sequences.

Original languageEnglish (US)
Pages (from-to)289-297
Number of pages9
JournalBioinformatics
Volume23
Issue number3
DOIs
StatePublished - Feb 1 2007

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics

Fingerprint

Dive into the research topics of 'Indelign: A probabilistic framework for annotation of insertions and deletions in a multiple alignment'. Together they form a unique fingerprint.

Cite this