Unsupervised learning of PCFGs with normalizing flow

Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Unsupervised PCFG inducers hypothesize sets of compact context-free rules as explanations for sentences. These models not only provide tools for low-resource languages, but also play an important role in modeling language acquisition (Bannard et al., 2009; Abend et al., 2017). However, current PCFG induction models, using word tokens as input, are unable to incorporate semantics and morphology into induction, and may encounter issues of sparse vocabulary when facing morphologically rich languages. This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information1. Linguistically motivated similarity penalty and categorical distance constraints are imposed on the inducer as regularization. Experiments show that the PCFG induction model with normalizing flow produces grammars with state-of-the-art accuracy on a variety of different languages. Ablation further shows a positive effect of normalizing flow, context embeddings and proposed regularizers.

Original languageEnglish (US)
Title of host publicationACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages2442-2452
Number of pages11
ISBN (Electronic)9781950737482
StatePublished - 2020
Event57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Florence, Italy
Duration: Jul 28 2019Aug 2 2019

Publication series

NameACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
CountryItaly
CityFlorence
Period7/28/198/2/19

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science(all)
  • Linguistics and Language

Fingerprint Dive into the research topics of 'Unsupervised learning of PCFGs with normalizing flow'. Together they form a unique fingerprint.

Cite this