TY - GEN
T1 - User-Guided Clustering in Heterogeneous Information Networks via Motif-Based Comprehensive Transcription
AU - Shi, Yu
AU - He, Xinwei
AU - Zhang, Naijing
AU - Yang, Carl
AU - Han, Jiawei
N1 - Publisher Copyright:
© Springer Nature Switzerland AG 2020.
PY - 2020
Y1 - 2020
N2 - Heterogeneous information networks (HINs) with rich semantics are ubiquitous in real-world applications. For a given HIN, many reasonable clustering results with distinct semantic meaning can simultaneously exist. User-guided clustering is hence of great practical value for HINs where users provide labels to a small portion of nodes. To cater to a broad spectrum of user guidance evidenced by different expected clustering results, carefully exploiting the signals residing in the data is potentially useful. Meanwhile, as one type of complex networks, HINs often encapsulate higher-order interactions that reflect the interlocked nature among nodes and edges. Network motifs, sometimes referred to as meta-graphs, have been used as tools to capture such higher-order interactions and reveal the many different semantics. We therefore approach the problem of user-guided clustering in HINs with network motifs. In this process, we identify the utility and importance of directly modeling higher-order interactions without collapsing them to pairwise interactions. To achieve this, we comprehensively transcribe the higher-order interaction signals to a series of tensors via motifs and propose the MoCHIN model based on joint non-negative tensor factorization. This approach applies to arbitrarily many, arbitrary forms of HIN motifs. An inference algorithm with speed-up methods is also proposed to tackle the challenge that tensor size grows exponentially as the number of nodes in a motif increases. We validate the effectiveness of the proposed method on two real-world datasets and three tasks, and MoCHIN outperforms all baselines in three evaluation tasks under three different metrics. Additional experiments demonstrated the utility of motifs and the benefit of directly modeling higher-order information especially when user guidance is limited. (The code and the data are available at https://github.com/NoSegfault/MoCHIN.)
AB - Heterogeneous information networks (HINs) with rich semantics are ubiquitous in real-world applications. For a given HIN, many reasonable clustering results with distinct semantic meaning can simultaneously exist. User-guided clustering is hence of great practical value for HINs where users provide labels to a small portion of nodes. To cater to a broad spectrum of user guidance evidenced by different expected clustering results, carefully exploiting the signals residing in the data is potentially useful. Meanwhile, as one type of complex networks, HINs often encapsulate higher-order interactions that reflect the interlocked nature among nodes and edges. Network motifs, sometimes referred to as meta-graphs, have been used as tools to capture such higher-order interactions and reveal the many different semantics. We therefore approach the problem of user-guided clustering in HINs with network motifs. In this process, we identify the utility and importance of directly modeling higher-order interactions without collapsing them to pairwise interactions. To achieve this, we comprehensively transcribe the higher-order interaction signals to a series of tensors via motifs and propose the MoCHIN model based on joint non-negative tensor factorization. This approach applies to arbitrarily many, arbitrary forms of HIN motifs. An inference algorithm with speed-up methods is also proposed to tackle the challenge that tensor size grows exponentially as the number of nodes in a motif increases. We validate the effectiveness of the proposed method on two real-world datasets and three tasks, and MoCHIN outperforms all baselines in three evaluation tasks under three different metrics. Additional experiments demonstrated the utility of motifs and the benefit of directly modeling higher-order information especially when user guidance is limited. (The code and the data are available at https://github.com/NoSegfault/MoCHIN.)
KW - Heterogeneous information networks
KW - Higher-order interactions
KW - Network motifs
KW - Non-negative tensor factorization
KW - User-guided clustering
UR - http://www.scopus.com/inward/record.url?scp=85084793607&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084793607&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-46150-8_22
DO - 10.1007/978-3-030-46150-8_22
M3 - Conference contribution
AN - SCOPUS:85084793607
SN - 9783030461492
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 361
EP - 377
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Proceedings
A2 - Brefeld, Ulf
A2 - Fromont, Elisa
A2 - Hotho, Andreas
A2 - Knobbe, Arno
A2 - Maathuis, Marloes
A2 - Robardet, Céline
PB - Springer
T2 - European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2019
Y2 - 16 September 2019 through 20 September 2019
ER -