Abstract
We present an algorithm which translates the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations. To do this we have needed to make several systematic changes to the Treebank which have to effect of cleaning up a number of errors and inconsistencies. This process has yielded a cleaner treebank that can potentially be used in any framework. We also show how unary type-changing rules for certain types of modifiers can be introduced in a CCG grammar to ensure a compact lexicon without augmenting the generative power of the system. We demonstrate how the combination of preprocessing and type-changing rules minimizes the lexical coverage problem.
Original language | English (US) |
---|---|
Pages | 1974-1981 |
Number of pages | 8 |
State | Published - 2002 |
Externally published | Yes |
Event | 3rd International Conference on Language Resources and Evaluation, LREC 2002 - Las Palmas, Canary Islands, Spain Duration: May 29 2002 → May 31 2002 |
Other
Other | 3rd International Conference on Language Resources and Evaluation, LREC 2002 |
---|---|
Country/Territory | Spain |
City | Las Palmas, Canary Islands |
Period | 5/29/02 → 5/31/02 |
ASJC Scopus subject areas
- Linguistics and Language
- Language and Linguistics
- Education
- Library and Information Sciences