TY - GEN
T1 - Discovering hypernymy in text-rich heterogeneous information network by exploiting context granularity
AU - Shi, Yu
AU - Shen, Jiaming
AU - Li, Yuchen
AU - Zhang, Naijing
AU - He, Xinwei
AU - Lou, Zhengzhi
AU - Zhu, Qi
AU - Walker, Matthew
AU - Kim, Myunghwan
AU - Han, Jiawei
N1 - Publisher Copyright:
© 2019 Association for Computing Machinery.
PY - 2019/11/3
Y1 - 2019/11/3
N2 - Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distributional representation for each term of interest based its context. These approaches rely on statistical signals from the textual corpus, and their effectiveness would therefore be hindered when the signals from the corpus are not sufficient for all terms of interest. In this work, we propose to discover hypernymy in text-rich HINs, which can introduce additional high-quality signals. We develop a new framework, named HyperMine, that exploits multi-granular contexts and combines signals from both text and network without human labeled data. HyperMine extends the definition of “context” to the scenario of text-rich HIN. For example, we can define typed nodes and communities as contexts. These contexts encode signals of different granularities and we feed them into a hypernymy inference model. HyperMine learns this model using weak supervision acquired based on high-precision textual patterns. Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity. We further show a case study that a high-quality taxonomy can be generated solely based on the hypernymy discovered by HyperMine.
AB - Text-rich heterogeneous information networks (text-rich HINs) are ubiquitous in real-world applications. Hypernymy, also known as is-a relation or subclass-of relation, lays in the core of many knowledge graphs and benefits many downstream applications. Existing methods of hypernymy discovery either leverage textual patterns to extract explicitly mentioned hypernym-hyponym pairs, or learn a distributional representation for each term of interest based its context. These approaches rely on statistical signals from the textual corpus, and their effectiveness would therefore be hindered when the signals from the corpus are not sufficient for all terms of interest. In this work, we propose to discover hypernymy in text-rich HINs, which can introduce additional high-quality signals. We develop a new framework, named HyperMine, that exploits multi-granular contexts and combines signals from both text and network without human labeled data. HyperMine extends the definition of “context” to the scenario of text-rich HIN. For example, we can define typed nodes and communities as contexts. These contexts encode signals of different granularities and we feed them into a hypernymy inference model. HyperMine learns this model using weak supervision acquired based on high-precision textual patterns. Extensive experiments on two large real-world datasets demonstrate the effectiveness of HyperMine and the utility of modeling context granularity. We further show a case study that a high-quality taxonomy can be generated solely based on the hypernymy discovered by HyperMine.
KW - Distributional Inclusion Hypothesis
KW - Heterogeneous Information Network
KW - Hypernymy Discovery
KW - Text-rich Network
UR - http://www.scopus.com/inward/record.url?scp=85075449270&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85075449270&partnerID=8YFLogxK
U2 - 10.1145/3357384.3357866
DO - 10.1145/3357384.3357866
M3 - Conference contribution
AN - SCOPUS:85075449270
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 599
EP - 608
BT - CIKM 2019 - Proceedings of the 28th ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 28th ACM International Conference on Information and Knowledge Management, CIKM 2019
Y2 - 3 November 2019 through 7 November 2019
ER -