Data Selection in Semi-supervised Learning for Name Tagging

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present two semi-supervised learning techniques to improve a state-of-The-Art multi-lingual name tagger. For English and Chinese, the overall system obtains 1.7%-2.1% improvement in F-measure, representing a 13.5%-17.4% relative reduction in the spurious, missing, and incorrect tags. We also conclude that simply relying upon large corpora is not in itself sufficient: we must pay attention to unlabeled data selection too. We describe effective measures to automatically select documents and sentences.

Original languageEnglish (US)
Title of host publicationCOLING ACL 2006 - Information Extraction Beyond The Document, Proceedings of the Workshop
EditorsMary Elaine Califf, Mark A. Greenwood, Mark Stevenson, Roman Yangarber
PublisherAssociation for Computational Linguistics (ACL)
Pages48-55
Number of pages8
ISBN (Electronic)1932432744, 9781932432749
StatePublished - 2006
Externally publishedYes
Event2006 Workshop on Information Extraction Beyond The Document, IE 2006 - Sydney, Australia
Duration: Jul 22 2006 → …

Publication series

NameCOLING ACL 2006 - Information Extraction Beyond The Document, Proceedings of the Workshop

Conference

Conference2006 Workshop on Information Extraction Beyond The Document, IE 2006
Country/TerritoryAustralia
CitySydney
Period7/22/06 → …

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Data Selection in Semi-supervised Learning for Name Tagging'. Together they form a unique fingerprint.

Cite this