TY - JOUR
T1 - An experimental comparison of partitioning strategies in distributed graph processing
AU - Verma, Shiv
AU - Leslie, Luke M.
AU - Shin, Yosub
AU - Gupta, Indranil
N1 - Funding Information:
This work was supported in part by the following grants: NSF CNS 1319527, NSF CNS 1409416, AFOSR/AFRL FA8750-11-2-0084, and a generous gift from Microsoft.
Publisher Copyright:
© 2017. VLDB Endowment.
PY - 2016
Y1 - 2016
N2 - In this paper, we study the problem of choosing among partitioning strategies in distributed graph processing systems. To this end, we evaluate and characterize both the performance and resource usage of different partitioning strategies under various popular distributed graph processing systems, applications, input graphs, and execution environments. Through our experiments, we found that no single partitioning strategy is the best fit for all situations, and that the choice of partitioning strategy has a significant effect on resource usage and application run-time. Our experiments demonstrate that the choice of partitioning strategy depends on (1) the degree distribution of input graph, (2) the type and duration of the application, and (3) the cluster size. Based on our results, we present rules of thumb to help users pick the best partitioning strategy for their particular use cases. We present results from each system, as well as from all partitioning strategies implemented in one common system (PowerLyra).
AB - In this paper, we study the problem of choosing among partitioning strategies in distributed graph processing systems. To this end, we evaluate and characterize both the performance and resource usage of different partitioning strategies under various popular distributed graph processing systems, applications, input graphs, and execution environments. Through our experiments, we found that no single partitioning strategy is the best fit for all situations, and that the choice of partitioning strategy has a significant effect on resource usage and application run-time. Our experiments demonstrate that the choice of partitioning strategy depends on (1) the degree distribution of input graph, (2) the type and duration of the application, and (3) the cluster size. Based on our results, we present rules of thumb to help users pick the best partitioning strategy for their particular use cases. We present results from each system, as well as from all partitioning strategies implemented in one common system (PowerLyra).
UR - http://www.scopus.com/inward/record.url?scp=85020390625&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85020390625&partnerID=8YFLogxK
U2 - 10.14778/3055540.3055543
DO - 10.14778/3055540.3055543
M3 - Conference article
AN - SCOPUS:85020390625
VL - 10
SP - 493
EP - 504
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
SN - 2150-8097
IS - 5
T2 - 43rd International Conference on Very Large Data Bases, VLDB 2017
Y2 - 28 August 2017 through 1 September 2017
ER -