Bootstrapping statistical parsers from small datasets

Mark Steedman, Miles Osborne, Anoop Sarkar, Stephen Clark, Rebecca Hwa, Julia Hockenmaier, Paul Ruhlen, Steven Baker, Jeremiah Crim

Research output: Contribution to conferencePaperpeer-review

Abstract

We present a practical co-training method for bootstrapping statistical parsers using a small amount of manually parsed training material and a much larger pool of raw sentences. Experimental results show that unlabelled sentences can be used to improve the performance of statistical parsers. In addition, we consider the problem of bootstrapping parsers when the manually parsed training material is in a different domain to either the raw sentences or the testing material. We show that bootstrapping continues to be useful, even though no manually produced parses from the target domain are used.

Original languageEnglish (US)
Pages331-338
Number of pages8
StatePublished - 2003
Externally publishedYes
Event10th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2003 - Budapest, Hungary
Duration: Apr 12 2003Apr 17 2003

Conference

Conference10th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2003
Country/TerritoryHungary
CityBudapest
Period4/12/034/17/03

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Bootstrapping statistical parsers from small datasets'. Together they form a unique fingerprint.

Cite this