Integrating clustering with ranking in heterogeneous information networks analysis

Yizhou Sun, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Heterogeneous information networks, ie, the logic networks involving multi-typed, interconnected objects, are ubiquitous. For example, a bibliographic information network contains nodes including authors, conferences, terms and papers, and links corresponding to relations exiting between these objects. Extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) into one huge cluster without distinction is dull as well. In contrast, a good cluster can lead to meaningful ranking for objects in that cluster, and ranking distributions for these objects can serve as good features to help clustering. Two ranking-based clustering algorithms, RankClus and NetClus, thus are proposed. RankClus aims at clustering target objects using the attribute objects in the remaining network, while NetClus is able to generate net-clusters containing multiple types of objects following the same schema of the original network. The basic idea of such algorithms is that ranking distributions of objects in each cluster should be quite different from each other, which can be served as features of clusters and new measures of objects can be calculated accordingly. Also, better clustering results can achieve better ranking results. Ranking and clustering can be mutually enhanced, where ranking provides better measure space and clustering provides more reasonable ranking distribution. What's more, clusters obtained in this way are more informative than other methods, given the ranking distribution for objects in each cluster.

Original languageEnglish (US)
Title of host publicationLink Mining
Subtitle of host publicationModels, Algorithms, and Applications
PublisherSpringer New York
Pages439-473
Number of pages35
Volume9781441965158
ISBN (Electronic)9781441965158
ISBN (Print)9781441965141
DOIs
StatePublished - Jan 1 2010

ASJC Scopus subject areas

  • Medicine(all)

Fingerprint Dive into the research topics of 'Integrating clustering with ranking in heterogeneous information networks analysis'. Together they form a unique fingerprint.

  • Cite this

    Sun, Y., & Han, J. (2010). Integrating clustering with ranking in heterogeneous information networks analysis. In Link Mining: Models, Algorithms, and Applications (Vol. 9781441965158, pp. 439-473). Springer New York. https://doi.org/10.1007/978-1-4419-6515-8-17