Example selection for bootstrapping statistical parsers

Mark Steedman, Rebecca Hwa, Stephen Clark, Miles Osborne, Anoop Sarkar, Julia Hockenmaier, Paul Ruhlen, Steven Baker, Jeremiah Crim

Research output: Contribution to conferencePaperpeer-review

Abstract

This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-training, in which two parsers are iteratively re-trained on each other’s output; and a semi-supervised approach, corrected co-training, in which a human corrects each parser’s output before adding it to the training data. The selection of labeled training examples is an integral part of both frameworks. We propose several selection methods based on the criteria of minimizing errors in the data and maximizing training utility. We show that incorporating the utility criterion into the selection method results in better parsers for both frameworks.

Original languageEnglish (US)
StatePublished - 2003
Externally publishedYes
Event2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003 - Edmonton, Canada
Duration: May 27 2003Jun 1 2003

Conference

Conference2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003
Country/TerritoryCanada
CityEdmonton
Period5/27/036/1/03

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Example selection for bootstrapping statistical parsers'. Together they form a unique fingerprint.

Cite this