TY - GEN
T1 - A local algorithm for structure-preserving graph cut
AU - Zhou, Dawei
AU - Zhang, Si
AU - Yildirim, Mehmet Yigit
AU - Alcorn, Scott
AU - Tong, Hanghang
AU - Davulcu, Hasan
AU - He, Jingrui
N1 - Publisher Copyright:
© 2017 Association for Computing Machinery.
PY - 2017/8/13
Y1 - 2017/8/13
N2 - Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traf-fic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for eficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOS-PLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and eficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and eficiency of our proposed HOS-PLOC algorithm.
AB - Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traf-fic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for eficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOS-PLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and eficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and eficiency of our proposed HOS-PLOC algorithm.
KW - High-order network structure
KW - Local clustering algorithm
UR - http://www.scopus.com/inward/record.url?scp=85029042139&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85029042139&partnerID=8YFLogxK
U2 - 10.1145/3097983.3098015
DO - 10.1145/3097983.3098015
M3 - Conference contribution
AN - SCOPUS:85029042139
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 655
EP - 664
BT - KDD 2017 - Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017
Y2 - 13 August 2017 through 17 August 2017
ER -