Text classification with heterogeneous information network kernels

Chenguang Wang, Yangqiu Song, Haoran Li, Ming Zhang, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Text classification is an important problem with many applications. Traditional approaches represent text as a bagof- words and build classifiers based on this representation. Rather than words, entity phrases, the relations between the entities, as well as the types of the entities and relations carry much more information to represent the texts. This paper presents a novel text as network classification framework, which introduces 1) a structured and typed heterogeneous information networks (HINs) representation of texts, and 2) a meta-path based approach to link texts. We show that with the new representation and links of texts, the structured and typed information of entities and relations can be incorporated into kernels. Particularly, we develop both simple linear kernel and indefinite kernel based on metapaths in the HIN representation of texts, where we call them HIN-kernels. Using Freebase, a well-known world knowledge base, to construct HIN for texts, our experiments on two benchmark datasets show that the indefinite HIN-kernel based on weighted meta-paths outperforms the state-of-Theart methods and other HIN-kernels.

Original languageEnglish (US)
Title of host publication30th AAAI Conference on Artificial Intelligence, AAAI 2016
PublisherAmerican Association for Artificial Intelligence (AAAI) Press
Pages2130-2136
Number of pages7
ISBN (Electronic)9781577357605
StatePublished - 2016
Event30th AAAI Conference on Artificial Intelligence, AAAI 2016 - Phoenix, United States
Duration: Feb 12 2016Feb 17 2016

Publication series

Name30th AAAI Conference on Artificial Intelligence, AAAI 2016

Other

Other30th AAAI Conference on Artificial Intelligence, AAAI 2016
Country/TerritoryUnited States
CityPhoenix
Period2/12/162/17/16

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Text classification with heterogeneous information network kernels'. Together they form a unique fingerprint.

Cite this