TY - GEN
T1 - Parallel K-clique counting on GPUs
AU - Almasri, Mohammad
AU - Hajj, Izzat El
AU - Nagi, Rakesh
AU - Xiong, Jinjun
AU - Hwu, Wen Mei
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/6/28
Y1 - 2022/6/28
N2 - Counting k-cliques in a graph is an important problem in graph analysis with many applications such as community detection and graph partitioning. Counting k-cliques is typically done by traversing search trees starting at each vertex in the graph. Parallelizing k-clique counting has been well-studied on CPUs and many solutions exist. However, there are no performant solutions for k-clique counting on GPUs. Parallelizing k-clique counting on GPUs comes with numerous challenges such as the need for extracting fine-grain multi-level parallelism, sensitivity to load imbalance, and constrained physical memory capacity. While there has been work on related problems such as finding maximal cliques and generalized sub-graph matching on GPUs, k-clique counting in particular has yet to be explored in depth. In this paper, we present the first parallel GPU solution specialized for the k-clique counting problem. Our solution supports both graph orientation and pivoting for eliminating redundant clique discovery. It incorporates both vertex-centric and edge-centric parallelization schemes for distributing work across thread blocks, and further partitions work within each thread block to extract fine-grain multi-level parallelism while tolerating load imbalance. It also includes optimizations such as binary encoding of induced sub-graphs and sub-warp partitioning to limit memory consumption and improve the utilization of execution resources. Our evaluation shows that our best GPU implementation outperforms the best state-of-the-art parallel CPU implementation by a geometric mean of 12.39X, 6.21X, and 18.99X for k = 4, 7, and 10, respectively. We also perform a detailed evaluation of the trade-offs involved in the choice of parallelization scheme, and the incremental speedup of each optimization to provide an in-depth understanding of the optimization space. The insights from our optimization flow can be useful for optimizing other clique finding and graph mining solutions on GPUs. Our code will be open-sourced to enable further research on GPU parallelization of k-clique counting and other similar graph mining algorithms.
AB - Counting k-cliques in a graph is an important problem in graph analysis with many applications such as community detection and graph partitioning. Counting k-cliques is typically done by traversing search trees starting at each vertex in the graph. Parallelizing k-clique counting has been well-studied on CPUs and many solutions exist. However, there are no performant solutions for k-clique counting on GPUs. Parallelizing k-clique counting on GPUs comes with numerous challenges such as the need for extracting fine-grain multi-level parallelism, sensitivity to load imbalance, and constrained physical memory capacity. While there has been work on related problems such as finding maximal cliques and generalized sub-graph matching on GPUs, k-clique counting in particular has yet to be explored in depth. In this paper, we present the first parallel GPU solution specialized for the k-clique counting problem. Our solution supports both graph orientation and pivoting for eliminating redundant clique discovery. It incorporates both vertex-centric and edge-centric parallelization schemes for distributing work across thread blocks, and further partitions work within each thread block to extract fine-grain multi-level parallelism while tolerating load imbalance. It also includes optimizations such as binary encoding of induced sub-graphs and sub-warp partitioning to limit memory consumption and improve the utilization of execution resources. Our evaluation shows that our best GPU implementation outperforms the best state-of-the-art parallel CPU implementation by a geometric mean of 12.39X, 6.21X, and 18.99X for k = 4, 7, and 10, respectively. We also perform a detailed evaluation of the trade-offs involved in the choice of parallelization scheme, and the incremental speedup of each optimization to provide an in-depth understanding of the optimization space. The insights from our optimization flow can be useful for optimizing other clique finding and graph mining solutions on GPUs. Our code will be open-sourced to enable further research on GPU parallelization of k-clique counting and other similar graph mining algorithms.
KW - GPU
KW - Graphs
KW - K-clique counting
KW - Parallel search tree traversal
UR - http://www.scopus.com/inward/record.url?scp=85132805507&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132805507&partnerID=8YFLogxK
U2 - 10.1145/3524059.3532382
DO - 10.1145/3524059.3532382
M3 - Conference contribution
AN - SCOPUS:85132805507
T3 - Proceedings of the International Conference on Supercomputing
BT - Proceedings of the 36th ACM International Conference on Supercomputing, ICS 2022
PB - Association for Computing Machinery
T2 - 36th ACM International Conference on Supercomputing, ICS 2022
Y2 - 27 June 2022 through 30 June 2022
ER -