Acquiring compact lexicalized grammars from a cleaner treebank

Julia Hockenmaier, Mark Steedman

Research output: Contribution to conferencePaperpeer-review

Abstract

We present an algorithm which translates the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations. To do this we have needed to make several systematic changes to the Treebank which have to effect of cleaning up a number of errors and inconsistencies. This process has yielded a cleaner treebank that can potentially be used in any framework. We also show how unary type-changing rules for certain types of modifiers can be introduced in a CCG grammar to ensure a compact lexicon without augmenting the generative power of the system. We demonstrate how the combination of preprocessing and type-changing rules minimizes the lexical coverage problem.

Original languageEnglish (US)
Pages1974-1981
Number of pages8
StatePublished - 2002
Externally publishedYes
Event3rd International Conference on Language Resources and Evaluation, LREC 2002 - Las Palmas, Canary Islands, Spain
Duration: May 29 2002May 31 2002

Other

Other3rd International Conference on Language Resources and Evaluation, LREC 2002
CountrySpain
CityLas Palmas, Canary Islands
Period5/29/025/31/02

ASJC Scopus subject areas

  • Linguistics and Language
  • Language and Linguistics
  • Education
  • Library and Information Sciences

Fingerprint Dive into the research topics of 'Acquiring compact lexicalized grammars from a cleaner treebank'. Together they form a unique fingerprint.

Cite this