TY - GEN
T1 - SetExpan
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2017
AU - Shen, Jiaming
AU - Wu, Zeqiu
AU - Lei, Dongming
AU - Shang, Jingbo
AU - Ren, Xiang
AU - Han, Jiawei
N1 - Funding Information:
Acknowledgments. Research was sponsored in part by the U.S. Army Research Lab. under Cooperative Agreement No. W911NF-09-2-0053 (NSCTA), National Science Foundation IIS-1320617, IIS 16-18481, and NSF IIS 17-04532, and grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov).
PY - 2017
Y1 - 2017
N2 - Corpus-based set expansion (i.e., finding the “complete” set of entities belonging to the same semantic class, based on a given corpus and a tiny set of seeds) is a critical task in knowledge discovery. It may facilitate numerous downstream applications, such as information extraction, taxonomy induction, question answering, and web search. To discover new entities in an expanded set, previous approaches either make one-time entity ranking based on distributional similarity, or resort to iterative pattern-based bootstrapping. The core challenge for these methods is how to deal with noisy context features derived from free-text corpora, which may lead to entity intrusion and semantic drifting. In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features. Experiments on three datasets show that SetExpan is robust and outperforms previous state-of-the-art methods in terms of mean average precision. Code related to this chapter is available at: https://github.com/mickeystroller/SetExpan Data related to this chapter are available at: https://goo.gl/1suS3Z
AB - Corpus-based set expansion (i.e., finding the “complete” set of entities belonging to the same semantic class, based on a given corpus and a tiny set of seeds) is a critical task in knowledge discovery. It may facilitate numerous downstream applications, such as information extraction, taxonomy induction, question answering, and web search. To discover new entities in an expanded set, previous approaches either make one-time entity ranking based on distributional similarity, or resort to iterative pattern-based bootstrapping. The core challenge for these methods is how to deal with noisy context features derived from free-text corpora, which may lead to entity intrusion and semantic drifting. In this study, we propose a novel framework, SetExpan, which tackles this problem, with two techniques: (1) a context feature selection method that selects clean context features for calculating entity-entity distributional similarity, and (2) a ranking-based unsupervised ensemble method for expanding entity set based on denoised context features. Experiments on three datasets show that SetExpan is robust and outperforms previous state-of-the-art methods in terms of mean average precision. Code related to this chapter is available at: https://github.com/mickeystroller/SetExpan Data related to this chapter are available at: https://goo.gl/1suS3Z
KW - Bootstrapping
KW - Information extraction
KW - Set expansion
KW - Unsupervised ranking-based ensemble
UR - http://www.scopus.com/inward/record.url?scp=85040258053&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85040258053&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-71249-9_18
DO - 10.1007/978-3-319-71249-9_18
M3 - Conference contribution
AN - SCOPUS:85040258053
SN - 9783319712482
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 288
EP - 304
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2017, Proceedings
A2 - Ceci, Michelangelo
A2 - Dzeroski, Saso
A2 - Vens, Celine
A2 - Todorovski, Ljupco
A2 - Hollmen, Jaakko
PB - Springer-Verlag Berlin Heidelberg
Y2 - 18 September 2017 through 22 September 2017
ER -