GIN: A clustering model for capturing dual heterogeneity in networked data

Jialu Liu, Chi Wang, Jing Gao, Quanquan Gu, Charu Aggarwal, Lance Kaplan, Jiawei Han

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Networked data often consists of interconnected multi-typed nodes and links. A common assumption behind such heterogeneity is the shared clustering structure. However, existing network clustering approaches oversimplify the heterogeneity by either treating nodes or links in a homogeneous fashion, resulting in massive loss of information. In addition, these studies are more or less restricted to specific network schémas or applications, losing generality. In this paper, we introduce a flexible model to explain the process of forming heterogeneous links based on shared clustering information of heterogeneous nodes. Specifically, we categorize the link generation process into binary and weighted cases and model them respectively. We show these two cases can be seamlessly integrated into a unified model. We propose to maximize a joint log-likelihood function to infer the model efficiently with Expectation Maximization (EM) algorithms. Experiments on real-world networked data sets demonstrate the effectiveness and flexibility of the proposed method in fully capturing the dual heterogeneity of both nodes and links.

Original languageEnglish (US)
Title of host publicationSIAM International Conference on Data Mining 2015, SDM 2015
EditorsSuresh Venkatasubramanian, Jieping Ye
PublisherSociety for Industrial and Applied Mathematics Publications
Pages388-396
Number of pages9
ISBN (Electronic)9781510811522
DOIs
StatePublished - Jan 1 2015
EventSIAM International Conference on Data Mining 2015, SDM 2015 - Vancouver, Canada
Duration: Apr 30 2015May 2 2015

Publication series

NameSIAM International Conference on Data Mining 2015, SDM 2015

Other

OtherSIAM International Conference on Data Mining 2015, SDM 2015
CountryCanada
CityVancouver
Period4/30/155/2/15

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Vision and Pattern Recognition
  • Software

Fingerprint Dive into the research topics of 'GIN: A clustering model for capturing dual heterogeneity in networked data'. Together they form a unique fingerprint.

  • Cite this

    Liu, J., Wang, C., Gao, J., Gu, Q., Aggarwal, C., Kaplan, L., & Han, J. (2015). GIN: A clustering model for capturing dual heterogeneity in networked data. In S. Venkatasubramanian, & J. Ye (Eds.), SIAM International Conference on Data Mining 2015, SDM 2015 (pp. 388-396). (SIAM International Conference on Data Mining 2015, SDM 2015). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974010.44