SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble

Jiaming Shen, Zeqiu Wu, Dongming Lei, Jingbo Shang, Xiang Ren, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Corpus-based set expansion (i.e., finding the “complete” set of entities belonging to the same semantic class, based on a given corpus and a tiny set of seeds) is a critical task in knowledge discovery. It may facilitate numerous downstream applications, such as information extraction, taxonomy induction, question answering, and web search. To discover new entities in an expanded set, previous approaches either make one-time entity ranking based on distributional similarity, or resort to iterative pattern-based bootstrapping. The core challenge for these methods is how to deal with noisy context features derived from free-text corpora, which may lead to entity intrusion and semantic drifting. In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features. Experiments on three datasets show that SetExpan is robust and outperforms previous state-of-the-art methods in terms of mean average precision. Code related to this chapter is available at: https://github.com/mickeystroller/SetExpan Data related to this chapter are available at: https://goo.gl/1suS3Z

Original languageEnglish (US)
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2017, Proceedings
EditorsMichelangelo Ceci, Saso Dzeroski, Celine Vens, Ljupco Todorovski, Jaakko Hollmen
PublisherSpringer-Verlag Berlin Heidelberg
Pages288-304
Number of pages17
ISBN (Print)9783319712482
DOIs
StatePublished - 2017
EventEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2017 - Skopje, Macedonia, The Former Yugoslav Republic of
Duration: Sep 18 2017Sep 22 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10534 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

OtherEuropean Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2017
CountryMacedonia, The Former Yugoslav Republic of
CitySkopje
Period9/18/179/22/17

Keywords

  • Bootstrapping
  • Information extraction
  • Set expansion
  • Unsupervised ranking-based ensemble

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'SetExpan: Corpus-Based Set Expansion via Context Feature Selection and Rank Ensemble'. Together they form a unique fingerprint.

Cite this