Relation strength-aware clustering of heterogeneous information networks with incomplete attributes

Yizhou Sun, Charu C. Aggarwal, Jiawei Han

Research output: Contribution to journalArticlepeer-review


With the rapid development of online social media, online shopping sites and cyber-physical systems, heterogeneous information networks have become increasingly popular and content-rich over time. In many cases, such networks contain multiple types of objects and links, as well as different kinds of attributes. The clustering of these objects can provide useful insights in many applications. However, the clustering of such networks can be challenging since (a) the attribute values of objects are often incomplete, which implies that an object may carry only partial attributes or even no attributes to correctly label itself; and (b) the links of different types may carry different kinds of semantic meanings, and it is a difficult task to determine the nature of their relative importance in helping the clustering for a given purpose. In this paper, we address these challenges by proposing a model-based clustering algorithm. We design a probabilistic model which clusters the objects of different types into a common hidden space, by using a user-specified set of attributes, as well as the links from different relations. The strengths of different types of links are automatically learned, and are determined by the given purpose of clustering. An iterative algorithm is designed for solving the clustering problem, in which the strengths of different types of links and the quality of clustering results mutually enhance each other. Our experimental results on real and synthetic data sets demonstrate the effectiveness and efficiency of the algorithm.

Original languageEnglish (US)
Pages (from-to)394-405
Number of pages12
JournalProceedings of the VLDB Endowment
Issue number5
StatePublished - Jan 2012

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)


Dive into the research topics of 'Relation strength-aware clustering of heterogeneous information networks with incomplete attributes'. Together they form a unique fingerprint.

Cite this