Integrating clustering with ranking in heterogeneous information networks analysis

Yizhou Sun, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Heterogeneous information networks, ie, the logic networks involving multi-typed, interconnected objects, are ubiquitous. For example, a bibliographic information network contains nodes including authors, conferences, terms and papers, and links corresponding to relations exiting between these objects. Extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) into one huge cluster without distinction is dull as well. In contrast, a good cluster can lead to meaningful ranking for objects in that cluster, and ranking distributions for these objects can serve as good features to help clustering. Two ranking-based clustering algorithms, RankClus and NetClus, thus are proposed. RankClus aims at clustering target objects using the attribute objects in the remaining network, while NetClus is able to generate net-clusters containing multiple types of objects following the same schema of the original network. The basic idea of such algorithms is that ranking distributions of objects in each cluster should be quite different from each other, which can be served as features of clusters and new measures of objects can be calculated accordingly. Also, better clustering results can achieve better ranking results. Ranking and clustering can be mutually enhanced, where ranking provides better measure space and clustering provides more reasonable ranking distribution. What's more, clusters obtained in this way are more informative than other methods, given the ranking distribution for objects in each cluster.

Original languageEnglish (US)
Title of host publicationLink Mining
Subtitle of host publicationModels, Algorithms, and Applications
PublisherSpringer New York
Pages439-473
Number of pages35
Volume9781441965158
ISBN (Electronic)9781441965158
ISBN (Print)9781441965141
DOIs
StatePublished - Jan 1 2010

Fingerprint

Information Services
Cluster Analysis
Computer Systems
Databases

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Sun, Y., & Han, J. (2010). Integrating clustering with ranking in heterogeneous information networks analysis. In Link Mining: Models, Algorithms, and Applications (Vol. 9781441965158, pp. 439-473). Springer New York. https://doi.org/10.1007/978-1-4419-6515-8-17

Integrating clustering with ranking in heterogeneous information networks analysis. / Sun, Yizhou; Han, Jiawei.

Link Mining: Models, Algorithms, and Applications. Vol. 9781441965158 Springer New York, 2010. p. 439-473.

Research output: Chapter in Book/Report/Conference proceedingChapter

Sun, Y & Han, J 2010, Integrating clustering with ranking in heterogeneous information networks analysis. in Link Mining: Models, Algorithms, and Applications. vol. 9781441965158, Springer New York, pp. 439-473. https://doi.org/10.1007/978-1-4419-6515-8-17
Sun Y, Han J. Integrating clustering with ranking in heterogeneous information networks analysis. In Link Mining: Models, Algorithms, and Applications. Vol. 9781441965158. Springer New York. 2010. p. 439-473 https://doi.org/10.1007/978-1-4419-6515-8-17
Sun, Yizhou ; Han, Jiawei. / Integrating clustering with ranking in heterogeneous information networks analysis. Link Mining: Models, Algorithms, and Applications. Vol. 9781441965158 Springer New York, 2010. pp. 439-473
@inbook{303573aa68fc4188b763dc7a9fe19ede,
title = "Integrating clustering with ranking in heterogeneous information networks analysis",
abstract = "Heterogeneous information networks, ie, the logic networks involving multi-typed, interconnected objects, are ubiquitous. For example, a bibliographic information network contains nodes including authors, conferences, terms and papers, and links corresponding to relations exiting between these objects. Extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) into one huge cluster without distinction is dull as well. In contrast, a good cluster can lead to meaningful ranking for objects in that cluster, and ranking distributions for these objects can serve as good features to help clustering. Two ranking-based clustering algorithms, RankClus and NetClus, thus are proposed. RankClus aims at clustering target objects using the attribute objects in the remaining network, while NetClus is able to generate net-clusters containing multiple types of objects following the same schema of the original network. The basic idea of such algorithms is that ranking distributions of objects in each cluster should be quite different from each other, which can be served as features of clusters and new measures of objects can be calculated accordingly. Also, better clustering results can achieve better ranking results. Ranking and clustering can be mutually enhanced, where ranking provides better measure space and clustering provides more reasonable ranking distribution. What's more, clusters obtained in this way are more informative than other methods, given the ranking distribution for objects in each cluster.",
author = "Yizhou Sun and Jiawei Han",
year = "2010",
month = "1",
day = "1",
doi = "10.1007/978-1-4419-6515-8-17",
language = "English (US)",
isbn = "9781441965141",
volume = "9781441965158",
pages = "439--473",
booktitle = "Link Mining",
publisher = "Springer New York",

}

TY - CHAP

T1 - Integrating clustering with ranking in heterogeneous information networks analysis

AU - Sun, Yizhou

AU - Han, Jiawei

PY - 2010/1/1

Y1 - 2010/1/1

N2 - Heterogeneous information networks, ie, the logic networks involving multi-typed, interconnected objects, are ubiquitous. For example, a bibliographic information network contains nodes including authors, conferences, terms and papers, and links corresponding to relations exiting between these objects. Extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) into one huge cluster without distinction is dull as well. In contrast, a good cluster can lead to meaningful ranking for objects in that cluster, and ranking distributions for these objects can serve as good features to help clustering. Two ranking-based clustering algorithms, RankClus and NetClus, thus are proposed. RankClus aims at clustering target objects using the attribute objects in the remaining network, while NetClus is able to generate net-clusters containing multiple types of objects following the same schema of the original network. The basic idea of such algorithms is that ranking distributions of objects in each cluster should be quite different from each other, which can be served as features of clusters and new measures of objects can be calculated accordingly. Also, better clustering results can achieve better ranking results. Ranking and clustering can be mutually enhanced, where ranking provides better measure space and clustering provides more reasonable ranking distribution. What's more, clusters obtained in this way are more informative than other methods, given the ranking distribution for objects in each cluster.

AB - Heterogeneous information networks, ie, the logic networks involving multi-typed, interconnected objects, are ubiquitous. For example, a bibliographic information network contains nodes including authors, conferences, terms and papers, and links corresponding to relations exiting between these objects. Extracting knowledge from information networks has become an important task. Both ranking and clustering can provide overall views on information network data, and each has been a hot topic by itself. However, ranking objects globally without considering which clusters they belong to often leads to dumb results, e.g., ranking database and computer architecture conferences together may not make much sense. Similarly, clustering a huge number of objects (e.g., thousands of authors) into one huge cluster without distinction is dull as well. In contrast, a good cluster can lead to meaningful ranking for objects in that cluster, and ranking distributions for these objects can serve as good features to help clustering. Two ranking-based clustering algorithms, RankClus and NetClus, thus are proposed. RankClus aims at clustering target objects using the attribute objects in the remaining network, while NetClus is able to generate net-clusters containing multiple types of objects following the same schema of the original network. The basic idea of such algorithms is that ranking distributions of objects in each cluster should be quite different from each other, which can be served as features of clusters and new measures of objects can be calculated accordingly. Also, better clustering results can achieve better ranking results. Ranking and clustering can be mutually enhanced, where ranking provides better measure space and clustering provides more reasonable ranking distribution. What's more, clusters obtained in this way are more informative than other methods, given the ranking distribution for objects in each cluster.

UR - http://www.scopus.com/inward/record.url?scp=84919836416&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84919836416&partnerID=8YFLogxK

U2 - 10.1007/978-1-4419-6515-8-17

DO - 10.1007/978-1-4419-6515-8-17

M3 - Chapter

AN - SCOPUS:84919836416

SN - 9781441965141

VL - 9781441965158

SP - 439

EP - 473

BT - Link Mining

PB - Springer New York

ER -