TY - GEN
T1 - Guiding Corpus-based Set Expansion by Auxiliary Sets Generation and Co-Expansion
AU - Huang, Jiaxin
AU - Xie, Yiqing
AU - Meng, Yu
AU - Shen, Jiaming
AU - Zhang, Yunyi
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/4/20
Y1 - 2020/4/20
N2 - Given a small set of seed entities (e.g., "USA", "Russia"), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user's interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.
AB - Given a small set of seed entities (e.g., "USA", "Russia"), corpus-based set expansion is to induce an extensive set of entities which share the same semantic class (Country in this example) from a given corpus. Set expansion benefits a wide range of downstream applications in knowledge discovery, such as web search, taxonomy construction, and query suggestion. Existing corpus-based set expansion algorithms typically bootstrap the given seeds by incorporating lexical patterns and distributional similarity. However, due to no negative sets provided explicitly, these methods suffer from semantic drift caused by expanding the seed set freely without guidance. We propose a new framework, Set-CoExpan, that automatically generates auxiliary sets as negative sets that are closely related to the target set of user's interest, and then performs multiple sets co-expansion that extracts discriminative features by comparing target set with auxiliary sets, to form multiple cohesive sets that are distinctive from one another, thus resolving the semantic drift issue. In this paper we demonstrate that by generating auxiliary sets, we can guide the expansion process of target set to avoid touching those ambiguous areas around the border with auxiliary sets, and we show that Set-CoExpan outperforms strong baseline methods significantly.
KW - Bootstrap Methods
KW - Semantic Computing
KW - Set Expansion
KW - Web Mining
UR - http://www.scopus.com/inward/record.url?scp=85086572023&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85086572023&partnerID=8YFLogxK
U2 - 10.1145/3366423.3380284
DO - 10.1145/3366423.3380284
M3 - Conference contribution
AN - SCOPUS:85086572023
T3 - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
SP - 2188
EP - 2198
BT - The Web Conference 2020 - Proceedings of the World Wide Web Conference, WWW 2020
PB - Association for Computing Machinery
T2 - 29th International World Wide Web Conference, WWW 2020
Y2 - 20 April 2020 through 24 April 2020
ER -