Abstract

Profile Hidden Markov Models (HMMs) are graphical models that can be used to produce finite length sequences from a distribution. In fact, although they were only introduced for bioinformatics 25 years ago (by Haussler et al., Hawaii International Conference on Systems Science 1993), they are arguably the most commonly used statistical model in bioinformatics, with multiple applications, including protein structure and function prediction, classifications of novel proteins into existing protein families and superfamilies, metagenomics, and multiple sequence alignment. The standard use of profile HMMs in bioinformatics has two steps: first a profile HMM is built for a collection of molecular sequences (which may not be in a multiple sequence alignment), and then the profile HMM is used in some subsequent analysis of new molecular sequences. The construction of the profile thus is itself a statistical estimation problem, since any given set of sequences might potentially fit more than one model well. Hence a basic question about profile HMMs is whether they are \em statistically identifiable, which means that no two profile HMMs can produce the same distribution on finite length sequences. Indeed, statistical identifiability is a fundamental aspect of any statistical model, and yet it is not known whether profile HMMs are statistically identifiable. In this paper, we report on preliminary results towards characterizing the statistical identifiability of profile HMMs in one of the standard forms used in bioinformatics.

Original languageEnglish (US)
Title of host publicationACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
PublisherAssociation for Computing Machinery, Inc
Pages448-456
Number of pages9
ISBN (Electronic)9781450357944
DOIs
StatePublished - Aug 15 2018
Event9th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2018 - Washington, United States
Duration: Aug 29 2018Sep 1 2018

Publication series

NameACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

Other

Other9th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2018
CountryUnited States
CityWashington
Period8/29/189/1/18

Fingerprint

Hidden Markov models
Computational Biology
Sequence Alignment
Statistical Models
Bioinformatics
Metagenomics
Proteins
Systems science

Keywords

  • Molecular sequence analysis
  • Profile hidden markov models
  • Statistical identifiability

ASJC Scopus subject areas

  • Computer Science Applications
  • Software
  • Health Informatics
  • Biomedical Engineering

Cite this

Pattabiraman, S., & Warnow, T. (2018). Are Profile Hidden Markov Models Identifiable? In ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 448-456). (ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics). Association for Computing Machinery, Inc. https://doi.org/10.1145/3233547.3233563

Are Profile Hidden Markov Models Identifiable? / Pattabiraman, Srilakshmi; Warnow, Tandy.

ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, 2018. p. 448-456 (ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Pattabiraman, S & Warnow, T 2018, Are Profile Hidden Markov Models Identifiable? in ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Association for Computing Machinery, Inc, pp. 448-456, 9th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2018, Washington, United States, 8/29/18. https://doi.org/10.1145/3233547.3233563
Pattabiraman S, Warnow T. Are Profile Hidden Markov Models Identifiable? In ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc. 2018. p. 448-456. (ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics). https://doi.org/10.1145/3233547.3233563
Pattabiraman, Srilakshmi ; Warnow, Tandy. / Are Profile Hidden Markov Models Identifiable?. ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, Inc, 2018. pp. 448-456 (ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics).
@inproceedings{6f09075beed54d2b8b1b3cc968d8e3b1,
title = "Are Profile Hidden Markov Models Identifiable?",
abstract = "Profile Hidden Markov Models (HMMs) are graphical models that can be used to produce finite length sequences from a distribution. In fact, although they were only introduced for bioinformatics 25 years ago (by Haussler et al., Hawaii International Conference on Systems Science 1993), they are arguably the most commonly used statistical model in bioinformatics, with multiple applications, including protein structure and function prediction, classifications of novel proteins into existing protein families and superfamilies, metagenomics, and multiple sequence alignment. The standard use of profile HMMs in bioinformatics has two steps: first a profile HMM is built for a collection of molecular sequences (which may not be in a multiple sequence alignment), and then the profile HMM is used in some subsequent analysis of new molecular sequences. The construction of the profile thus is itself a statistical estimation problem, since any given set of sequences might potentially fit more than one model well. Hence a basic question about profile HMMs is whether they are \em statistically identifiable, which means that no two profile HMMs can produce the same distribution on finite length sequences. Indeed, statistical identifiability is a fundamental aspect of any statistical model, and yet it is not known whether profile HMMs are statistically identifiable. In this paper, we report on preliminary results towards characterizing the statistical identifiability of profile HMMs in one of the standard forms used in bioinformatics.",
keywords = "Molecular sequence analysis, Profile hidden markov models, Statistical identifiability",
author = "Srilakshmi Pattabiraman and Tandy Warnow",
year = "2018",
month = "8",
day = "15",
doi = "10.1145/3233547.3233563",
language = "English (US)",
series = "ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics",
publisher = "Association for Computing Machinery, Inc",
pages = "448--456",
booktitle = "ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics",

}

TY - GEN

T1 - Are Profile Hidden Markov Models Identifiable?

AU - Pattabiraman, Srilakshmi

AU - Warnow, Tandy

PY - 2018/8/15

Y1 - 2018/8/15

N2 - Profile Hidden Markov Models (HMMs) are graphical models that can be used to produce finite length sequences from a distribution. In fact, although they were only introduced for bioinformatics 25 years ago (by Haussler et al., Hawaii International Conference on Systems Science 1993), they are arguably the most commonly used statistical model in bioinformatics, with multiple applications, including protein structure and function prediction, classifications of novel proteins into existing protein families and superfamilies, metagenomics, and multiple sequence alignment. The standard use of profile HMMs in bioinformatics has two steps: first a profile HMM is built for a collection of molecular sequences (which may not be in a multiple sequence alignment), and then the profile HMM is used in some subsequent analysis of new molecular sequences. The construction of the profile thus is itself a statistical estimation problem, since any given set of sequences might potentially fit more than one model well. Hence a basic question about profile HMMs is whether they are \em statistically identifiable, which means that no two profile HMMs can produce the same distribution on finite length sequences. Indeed, statistical identifiability is a fundamental aspect of any statistical model, and yet it is not known whether profile HMMs are statistically identifiable. In this paper, we report on preliminary results towards characterizing the statistical identifiability of profile HMMs in one of the standard forms used in bioinformatics.

AB - Profile Hidden Markov Models (HMMs) are graphical models that can be used to produce finite length sequences from a distribution. In fact, although they were only introduced for bioinformatics 25 years ago (by Haussler et al., Hawaii International Conference on Systems Science 1993), they are arguably the most commonly used statistical model in bioinformatics, with multiple applications, including protein structure and function prediction, classifications of novel proteins into existing protein families and superfamilies, metagenomics, and multiple sequence alignment. The standard use of profile HMMs in bioinformatics has two steps: first a profile HMM is built for a collection of molecular sequences (which may not be in a multiple sequence alignment), and then the profile HMM is used in some subsequent analysis of new molecular sequences. The construction of the profile thus is itself a statistical estimation problem, since any given set of sequences might potentially fit more than one model well. Hence a basic question about profile HMMs is whether they are \em statistically identifiable, which means that no two profile HMMs can produce the same distribution on finite length sequences. Indeed, statistical identifiability is a fundamental aspect of any statistical model, and yet it is not known whether profile HMMs are statistically identifiable. In this paper, we report on preliminary results towards characterizing the statistical identifiability of profile HMMs in one of the standard forms used in bioinformatics.

KW - Molecular sequence analysis

KW - Profile hidden markov models

KW - Statistical identifiability

UR - http://www.scopus.com/inward/record.url?scp=85056088500&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056088500&partnerID=8YFLogxK

U2 - 10.1145/3233547.3233563

DO - 10.1145/3233547.3233563

M3 - Conference contribution

T3 - ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

SP - 448

EP - 456

BT - ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

PB - Association for Computing Machinery, Inc

ER -