TY - CONF
T1 - Example selection for bootstrapping statistical parsers
AU - Steedman, Mark
AU - Hwa, Rebecca
AU - Clark, Stephen
AU - Osborne, Miles
AU - Sarkar, Anoop
AU - Hockenmaier, Julia
AU - Ruhlen, Paul
AU - Baker, Steven
AU - Crim, Jeremiah
N1 - Funding Information:
This work has been supported, in part, by NSF/DARPA funded 2002 Human Language Engineering Workshop at JHU, EPSRC grant GR/M96889, the Department of Defense contract RD-02-5700, and ONR MURI Contract FCPO.810548265. We would like to thank Chris Callison-Burch, Michael Collins, John Henderson, Lillian Lee, Andrew McCallum, and Fernando Pereira for helpful discussions; to Ric Crabbe, Adam Lopez, the participants of CS775 at Cornell University, and the reviewers for their comments on this paper.
Publisher Copyright:
© 2003 Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003. All rights reserved.
PY - 2003
Y1 - 2003
N2 - This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-training, in which two parsers are iteratively re-trained on each other’s output; and a semi-supervised approach, corrected co-training, in which a human corrects each parser’s output before adding it to the training data. The selection of labeled training examples is an integral part of both frameworks. We propose several selection methods based on the criteria of minimizing errors in the data and maximizing training utility. We show that incorporating the utility criterion into the selection method results in better parsers for both frameworks.
AB - This paper investigates bootstrapping for statistical parsers to reduce their reliance on manually annotated training data. We consider both a mostly-unsupervised approach, co-training, in which two parsers are iteratively re-trained on each other’s output; and a semi-supervised approach, corrected co-training, in which a human corrects each parser’s output before adding it to the training data. The selection of labeled training examples is an integral part of both frameworks. We propose several selection methods based on the criteria of minimizing errors in the data and maximizing training utility. We show that incorporating the utility criterion into the selection method results in better parsers for both frameworks.
UR - http://www.scopus.com/inward/record.url?scp=85055469713&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85055469713&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85055469713
T2 - 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003
Y2 - 27 May 2003 through 1 June 2003
ER -