TY - GEN
T1 - Text classification with heterogeneous information network kernels
AU - Wang, Chenguang
AU - Song, Yangqiu
AU - Li, Haoran
AU - Zhang, Ming
AU - Han, Jiawei
N1 - Funding Information:
Chenguang Wang gratefully acknowledges the support by the National Natural Science Foundation of China (NSFC Grant No. 61472006 and 61272343), the National Basic Research Program (973 Program No. 2014CB340405), and Doctoral Fund of Ministry of Education of China (MOEC RFDP Grant No. 20130001110032). The research is also partially supported by the Army Research Laboratory (ARL) under agreement W911NF-09-2-0053, and by DARPA under agreement number FA8750-13-2-0008. Research is also partially sponsored by National Science Foundation IIS-1017362, IIS-1320617, and IIS-1354329, HDTRA1-10-1-0120, and grant 1U54GM114838 awarded by NIGMS through funds provided by the trans-NIH Big Data to Knowledge (BD2K) initiative (www.bd2k.nih.gov), and MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC.
Publisher Copyright:
© Copyright 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2016
Y1 - 2016
N2 - Text classification is an important problem with many applications. Traditional approaches represent text as a bagof- words and build classifiers based on this representation. Rather than words, entity phrases, the relations between the entities, as well as the types of the entities and relations carry much more information to represent the texts. This paper presents a novel text as network classification framework, which introduces 1) a structured and typed heterogeneous information networks (HINs) representation of texts, and 2) a meta-path based approach to link texts. We show that with the new representation and links of texts, the structured and typed information of entities and relations can be incorporated into kernels. Particularly, we develop both simple linear kernel and indefinite kernel based on metapaths in the HIN representation of texts, where we call them HIN-kernels. Using Freebase, a well-known world knowledge base, to construct HIN for texts, our experiments on two benchmark datasets show that the indefinite HIN-kernel based on weighted meta-paths outperforms the state-of-Theart methods and other HIN-kernels.
AB - Text classification is an important problem with many applications. Traditional approaches represent text as a bagof- words and build classifiers based on this representation. Rather than words, entity phrases, the relations between the entities, as well as the types of the entities and relations carry much more information to represent the texts. This paper presents a novel text as network classification framework, which introduces 1) a structured and typed heterogeneous information networks (HINs) representation of texts, and 2) a meta-path based approach to link texts. We show that with the new representation and links of texts, the structured and typed information of entities and relations can be incorporated into kernels. Particularly, we develop both simple linear kernel and indefinite kernel based on metapaths in the HIN representation of texts, where we call them HIN-kernels. Using Freebase, a well-known world knowledge base, to construct HIN for texts, our experiments on two benchmark datasets show that the indefinite HIN-kernel based on weighted meta-paths outperforms the state-of-Theart methods and other HIN-kernels.
UR - http://www.scopus.com/inward/record.url?scp=85005958335&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85005958335&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85005958335
T3 - 30th AAAI Conference on Artificial Intelligence, AAAI 2016
SP - 2130
EP - 2136
BT - 30th AAAI Conference on Artificial Intelligence, AAAI 2016
PB - American Association for Artificial Intelligence (AAAI) Press
T2 - 30th AAAI Conference on Artificial Intelligence, AAAI 2016
Y2 - 12 February 2016 through 17 February 2016
ER -