A PAC-Bayesian approach to minimum perplexity language modeling

Sujeeth Bharadwaj, Mark Hasegawa-Johnson

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Despite the overwhelming use of statistical language models in speech recognition, machine translation, and several other domains, few high probability guarantees exist on their generalization error. In this paper, we bound the test set perplexity of two popular language models - The n-gram model and class-based n-grams - using PAC-Bayesian theorems for unsupervised learning. We extend the bound to sequence clustering, wherein classes represent longer context such as phrases. The new bound is dominated by the maximum number of sequences represented by each cluster, which is polynomial in the vocabulary size. We show that we can still encourage small sample generalization by sparsifying the cluster assignment probabilities. We incorporate our bound into an efficient HMM-based sequence clustering algorithm and validate the theory with empirical results on the resource management corpus.

Original languageEnglish (US)
Title of host publicationCOLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014
Subtitle of host publicationTechnical Papers
PublisherAssociation for Computational Linguistics, ACL Anthology
Pages130-140
Number of pages11
ISBN (Electronic)9781941643266
StatePublished - 2014
Event25th International Conference on Computational Linguistics, COLING 2014 - Dublin, Ireland
Duration: Aug 23 2014Aug 29 2014

Publication series

NameCOLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers

Other

Other25th International Conference on Computational Linguistics, COLING 2014
Country/TerritoryIreland
CityDublin
Period8/23/148/29/14

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'A PAC-Bayesian approach to minimum perplexity language modeling'. Together they form a unique fingerprint.

Cite this