Identification and tracing of ambiguous names: Discriminative and generative approaches

Xin Li, Paul Morie, Dan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A given entity - representing a person, a location or an organization - may be mentioned in text in multiple, ambiguous ways. Understanding natural language requires identifying whether different mentions of a name, within and across documents, represent the same entity. We present two machine learning approaches to this problem, which we call the "Robust Reading" problem. Our first approach is a discriminative approach, trained in a supervised way. Our second approach is a generative model, at the heart of which is a view on how documents are generated and how names (of different entity types) are "sprinkled" into them. In its most general form, our model assumes: (1) a joint distribution over entities (e.g., a document that mentions "President Kennedy" is more likely to mention "Oswald" or "White House" than "Roger Clemens"), (2) an "author" model, that assumes that at least one mention of an entity in a document is easily identifiable, and then generates other mentions via (3) an appearance model, governing how mentions are transformed from the "representative" mention. We show that both approaches perform very accurately, in the range of 90% - 95% F 1 measure for different entity types, much better than previous approaches to (some aspects of) this problem. Our extensive experiments exhibit the contribution of relational and structural features and, somewhat surprisingly, that the assumptions made within our generative model are strong enough to yield a very powerful approach, that performs better than a supervised approach with limited supervised information.

Original languageEnglish (US)
Title of host publicationProceedings - Nineteenth National Conference on Artificial Intelligence (AAAI-04)
Subtitle of host publicationSixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2004)
Pages419-424
Number of pages6
StatePublished - 2004
EventProceedings - Nineteenth National Conference on Artificial Intelligence (AAAI-2004): Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2004) - San Jose, CA, United States
Duration: Jul 25 2004Jul 29 2004

Other

OtherProceedings - Nineteenth National Conference on Artificial Intelligence (AAAI-2004): Sixteenth Innovative Applications of Artificial Intelligence Conference (IAAI-2004)
CountryUnited States
CitySan Jose, CA
Period7/25/047/29/04

ASJC Scopus subject areas

  • Software
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Identification and tracing of ambiguous names: Discriminative and generative approaches'. Together they form a unique fingerprint.

Cite this