TY - GEN
T1 - Automatic synonym discovery with knowledge bases
AU - Qu, Meng
AU - Ren, Xiang
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2017 ACM.
PY - 2017/8/13
Y1 - 2017/8/13
N2 - Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to multiple entities (e.g., "apple" could refer to both Apple Inc and the fruit apple). Moreover, most existing methods require training data manually created by domain experts to construct supervised-learning systems. In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus. The manually-curated synonyms for each entity stored in a knowledge base not only form a set of name strings to disambiguate the meaning for each other, but also can serve as "distant" supervision to help determine important features for the task. We propose a novel framework, called DPE, to integrate two kinds of mutually-complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.
AB - Recognizing entity synonyms from text has become a crucial task in many entity-leveraging applications. However, discovering entity synonyms from domain-specific text corpora (e.g., news articles, scientific papers) is rather challenging. Current systems take an entity name string as input to find out other names that are synonymous, ignoring the fact that often times a name string can refer to multiple entities (e.g., "apple" could refer to both Apple Inc and the fruit apple). Moreover, most existing methods require training data manually created by domain experts to construct supervised-learning systems. In this paper, we study the problem of automatic synonym discovery with knowledge bases, that is, identifying synonyms for knowledge base entities in a given domain-specific corpus. The manually-curated synonyms for each entity stored in a knowledge base not only form a set of name strings to disambiguate the meaning for each other, but also can serve as "distant" supervision to help determine important features for the task. We propose a novel framework, called DPE, to integrate two kinds of mutually-complementing signals for synonym discovery, i.e., distributional features based on corpus-level statistics and textual patterns based on local contexts. In particular, DPE jointly optimizes the two kinds of signals in conjunction with distant supervision, so that they can mutually enhance each other in the training stage. At the inference stage, both signals will be utilized to discover synonyms for the given entities. Experimental results prove the effectiveness of the proposed framework.
UR - http://www.scopus.com/inward/record.url?scp=85029125643&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029125643&partnerID=8YFLogxK
U2 - 10.1145/3097983.3098185
DO - 10.1145/3097983.3098185
M3 - Conference contribution
AN - SCOPUS:85029125643
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 997
EP - 1005
BT - KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
Y2 - 13 August 2017 through 17 August 2017
ER -