### Abstract

Profile Hidden Markov Models (HMMs) are graphical models that can be used to produce finite length sequences from a distribution. In fact, although they were only introduced for bioinformatics 25 years ago (by Haussler et al., Hawaii International Conference on Systems Science 1993), they are arguably the most commonly used statistical model in bioinformatics, with multiple applications, including protein structure and function prediction, classifications of novel proteins into existing protein families and superfamilies, metagenomics, and multiple sequence alignment. The standard use of profile HMMs in bioinformatics has two steps: first a profile HMM is built for a collection of molecular sequences (which may not be in a multiple sequence alignment), and then the profile HMM is used in some subsequent analysis of new molecular sequences. The construction of the profile thus is itself a statistical estimation problem, since any given set of sequences might potentially fit more than one model well. Hence a basic question about profile HMMs is whether they are \em statistically identifiable, which means that no two profile HMMs can produce the same distribution on finite length sequences. Indeed, statistical identifiability is a fundamental aspect of any statistical model, and yet it is not known whether profile HMMs are statistically identifiable. In this paper, we report on preliminary results towards characterizing the statistical identifiability of profile HMMs in one of the standard forms used in bioinformatics.

Original language | English (US) |
---|---|

Title of host publication | ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics |

Publisher | Association for Computing Machinery, Inc |

Pages | 448-456 |

Number of pages | 9 |

ISBN (Electronic) | 9781450357944 |

DOIs | |

State | Published - Aug 15 2018 |

Event | 9th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2018 - Washington, United States Duration: Aug 29 2018 → Sep 1 2018 |

### Publication series

Name | ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics |
---|

### Other

Other | 9th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2018 |
---|---|

Country | United States |

City | Washington |

Period | 8/29/18 → 9/1/18 |

### Fingerprint

### Keywords

- Molecular sequence analysis
- Profile hidden markov models
- Statistical identifiability

### ASJC Scopus subject areas

- Computer Science Applications
- Software
- Health Informatics
- Biomedical Engineering

### Cite this

*ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics*(pp. 448-456). (ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics). Association for Computing Machinery, Inc. https://doi.org/10.1145/3233547.3233563

**Are Profile Hidden Markov Models Identifiable?** / Pattabiraman, Srilakshmi; Warnow, Tandy.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics.*ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, Association for Computing Machinery, Inc, pp. 448-456, 9th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics, ACM-BCB 2018, Washington, United States, 8/29/18. https://doi.org/10.1145/3233547.3233563

}

TY - GEN

T1 - Are Profile Hidden Markov Models Identifiable?

AU - Pattabiraman, Srilakshmi

AU - Warnow, Tandy

PY - 2018/8/15

Y1 - 2018/8/15

N2 - Profile Hidden Markov Models (HMMs) are graphical models that can be used to produce finite length sequences from a distribution. In fact, although they were only introduced for bioinformatics 25 years ago (by Haussler et al., Hawaii International Conference on Systems Science 1993), they are arguably the most commonly used statistical model in bioinformatics, with multiple applications, including protein structure and function prediction, classifications of novel proteins into existing protein families and superfamilies, metagenomics, and multiple sequence alignment. The standard use of profile HMMs in bioinformatics has two steps: first a profile HMM is built for a collection of molecular sequences (which may not be in a multiple sequence alignment), and then the profile HMM is used in some subsequent analysis of new molecular sequences. The construction of the profile thus is itself a statistical estimation problem, since any given set of sequences might potentially fit more than one model well. Hence a basic question about profile HMMs is whether they are \em statistically identifiable, which means that no two profile HMMs can produce the same distribution on finite length sequences. Indeed, statistical identifiability is a fundamental aspect of any statistical model, and yet it is not known whether profile HMMs are statistically identifiable. In this paper, we report on preliminary results towards characterizing the statistical identifiability of profile HMMs in one of the standard forms used in bioinformatics.

AB - Profile Hidden Markov Models (HMMs) are graphical models that can be used to produce finite length sequences from a distribution. In fact, although they were only introduced for bioinformatics 25 years ago (by Haussler et al., Hawaii International Conference on Systems Science 1993), they are arguably the most commonly used statistical model in bioinformatics, with multiple applications, including protein structure and function prediction, classifications of novel proteins into existing protein families and superfamilies, metagenomics, and multiple sequence alignment. The standard use of profile HMMs in bioinformatics has two steps: first a profile HMM is built for a collection of molecular sequences (which may not be in a multiple sequence alignment), and then the profile HMM is used in some subsequent analysis of new molecular sequences. The construction of the profile thus is itself a statistical estimation problem, since any given set of sequences might potentially fit more than one model well. Hence a basic question about profile HMMs is whether they are \em statistically identifiable, which means that no two profile HMMs can produce the same distribution on finite length sequences. Indeed, statistical identifiability is a fundamental aspect of any statistical model, and yet it is not known whether profile HMMs are statistically identifiable. In this paper, we report on preliminary results towards characterizing the statistical identifiability of profile HMMs in one of the standard forms used in bioinformatics.

KW - Molecular sequence analysis

KW - Profile hidden markov models

KW - Statistical identifiability

UR - http://www.scopus.com/inward/record.url?scp=85056088500&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85056088500&partnerID=8YFLogxK

U2 - 10.1145/3233547.3233563

DO - 10.1145/3233547.3233563

M3 - Conference contribution

T3 - ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

SP - 448

EP - 456

BT - ACM-BCB 2018 - Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

PB - Association for Computing Machinery, Inc

ER -