Abstract
Many real-world systems consist of several types of entities, and heterogeneous networks are required to represent such systems. However, the current statistical toolbox for network data can only deal with homogeneous networks, where all nodes are supposed to be of the same type. This article introduces a statistical framework for community detection in heterogeneous networks. For modeling heterogeneous networks, we propose heterogeneous versions of both the classical stochastic blockmodel and the degree-corrected blockmodel. For community detection, we formulate heterogeneous versions of standard spectral clustering and regularized spectral clustering. We demonstrate the theoretical accuracy of the proposed heterogeneous methods for networks generated from the proposed heterogeneous models. Our simulations establish the superiority of proposed heterogeneous methods over existing homogeneous methods in finite networks generated from the models. An analysis of the DBLP four-area data demonstrates the improved accuracy of the heterogeneous method over the homogeneous method in identifying research areas for authors.
Original language | English (US) |
---|---|
Pages (from-to) | 1081-1106 |
Number of pages | 26 |
Journal | Statistica Sinica |
Volume | 25 |
Issue number | 3 |
DOIs | |
State | Published - Jul 2015 |
Keywords
- Clustering
- Community detection
- Degree-corrected blockmodel
- Heterogeneous network
- Network analysis
- Regularized spectral clustering
- Spectral clustering
- Stochastic blockmodel
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty