Gender and animacy knowledge discovery from web-scale n-grams for unsupervised person mention detection

Heng Ji, Dekang Lin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper we present a simple approach to discover gender and animacy knowledge for person mention detection. We learn noun-gender and noun-animacy pair counts from web-scale n-grams using specific lexical patterns, and then apply confidence estimation metrics to filter noise. The selected informative pairs are then used to detect person mentions from raw texts in an unsupervised learning framework. Experiments showed that this approach can achieve high performance comparable to state-of-the-art supervised learning methods which require manually annotated corpora and gazetteers.

Original languageEnglish (US)
Title of host publicationPACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation
Pages220-229
Number of pages10
StatePublished - 2009
Externally publishedYes
Event23rd Pacific Asia Conference on Language, Information and Computation, PACLIC 23 - Hong Kong, China
Duration: Dec 3 2009Dec 5 2009

Publication series

NamePACLIC 23 - Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation
Volume1

Other

Other23rd Pacific Asia Conference on Language, Information and Computation, PACLIC 23
Country/TerritoryChina
CityHong Kong
Period12/3/0912/5/09

Keywords

  • Animacy
  • Gender
  • Knowledge discovery
  • Mention detection
  • N-grams

ASJC Scopus subject areas

  • Language and Linguistics
  • Computer Science (miscellaneous)

Fingerprint

Dive into the research topics of 'Gender and animacy knowledge discovery from web-scale n-grams for unsupervised person mention detection'. Together they form a unique fingerprint.

Cite this