Learning sufficient queries for entity filtering

Miles Efron, Craig Willis, Garrick Sherman

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Entity-centric document filtering is the task of analyzing a time-ordered stream of documents and emitting those that are relevant to a specified set of entities (e.g., people, places, organizations). This task is exemplified by the TREC Knowledge Base Acceleration (KBA) track and has broad applicability in other modern IR settings. In this paper, we present a simple yet effective approach based on learning high-quality Boolean queries that can be applied deterministically during filtering. We call these Boolean statements sufficient queries. We argue that using deterministic queries for entity-centric filtering can reduce confounding factors seen in more familiar "score-then- threshold" filtering methods. Experiments on two standard datasets show significant improvements over state-of-the-art baseline models.

Original languageEnglish (US)
Title of host publicationSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval
PublisherAssociation for Computing Machinery
Pages1091-1094
Number of pages4
ISBN (Print)9781450322591
DOIs
StatePublished - 2014
Event37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014 - Gold Coast, QLD, Australia
Duration: Jul 6 2014Jul 11 2014

Publication series

NameSIGIR 2014 - Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval

Other

Other37th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2014
CountryAustralia
CityGold Coast, QLD
Period7/6/147/11/14

Keywords

  • Boolean models
  • Document filtering
  • Entity retrieval

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems

Fingerprint Dive into the research topics of 'Learning sufficient queries for entity filtering'. Together they form a unique fingerprint.

Cite this