An experimental comparison of partitioning strategies in distributed graph processing

Shiv Verma, Luke M. Leslie, Yosub Shin, Indranil Gupta

Research output: Contribution to journalConference article

Abstract

In this paper, we study the problem of choosing among partitioning strategies in distributed graph processing systems. To this end, we evaluate and characterize both the performance and resource usage of different partitioning strategies under various popular distributed graph processing systems, applications, input graphs, and execution environments. Through our experiments, we found that no single partitioning strategy is the best fit for all situations, and that the choice of partitioning strategy has a significant effect on resource usage and application run-time. Our experiments demonstrate that the choice of partitioning strategy depends on (1) the degree distribution of input graph, (2) the type and duration of the application, and (3) the cluster size. Based on our results, we present rules of thumb to help users pick the best partitioning strategy for their particular use cases. We present results from each system, as well as from all partitioning strategies implemented in one common system (PowerLyra).

Original languageEnglish (US)
Pages (from-to)493-504
Number of pages12
JournalProceedings of the VLDB Endowment
Volume10
Issue number5
DOIs
StatePublished - Jan 1 2016
Event43rd International Conference on Very Large Data Bases, VLDB 2017 - Munich, Germany
Duration: Aug 28 2017Sep 1 2017

Fingerprint

Processing
Experiments

ASJC Scopus subject areas

  • Computer Science (miscellaneous)
  • Computer Science(all)

Cite this

An experimental comparison of partitioning strategies in distributed graph processing. / Verma, Shiv; Leslie, Luke M.; Shin, Yosub; Gupta, Indranil.

In: Proceedings of the VLDB Endowment, Vol. 10, No. 5, 01.01.2016, p. 493-504.

Research output: Contribution to journalConference article

@article{4e56cc0eede44818995bed0fa403d8e8,
title = "An experimental comparison of partitioning strategies in distributed graph processing",
abstract = "In this paper, we study the problem of choosing among partitioning strategies in distributed graph processing systems. To this end, we evaluate and characterize both the performance and resource usage of different partitioning strategies under various popular distributed graph processing systems, applications, input graphs, and execution environments. Through our experiments, we found that no single partitioning strategy is the best fit for all situations, and that the choice of partitioning strategy has a significant effect on resource usage and application run-time. Our experiments demonstrate that the choice of partitioning strategy depends on (1) the degree distribution of input graph, (2) the type and duration of the application, and (3) the cluster size. Based on our results, we present rules of thumb to help users pick the best partitioning strategy for their particular use cases. We present results from each system, as well as from all partitioning strategies implemented in one common system (PowerLyra).",
author = "Shiv Verma and Leslie, {Luke M.} and Yosub Shin and Indranil Gupta",
year = "2016",
month = "1",
day = "1",
doi = "10.14778/3055540.3055543",
language = "English (US)",
volume = "10",
pages = "493--504",
journal = "Proceedings of the VLDB Endowment",
issn = "2150-8097",
publisher = "Very Large Data Base Endowment Inc.",
number = "5",

}

TY - JOUR

T1 - An experimental comparison of partitioning strategies in distributed graph processing

AU - Verma, Shiv

AU - Leslie, Luke M.

AU - Shin, Yosub

AU - Gupta, Indranil

PY - 2016/1/1

Y1 - 2016/1/1

N2 - In this paper, we study the problem of choosing among partitioning strategies in distributed graph processing systems. To this end, we evaluate and characterize both the performance and resource usage of different partitioning strategies under various popular distributed graph processing systems, applications, input graphs, and execution environments. Through our experiments, we found that no single partitioning strategy is the best fit for all situations, and that the choice of partitioning strategy has a significant effect on resource usage and application run-time. Our experiments demonstrate that the choice of partitioning strategy depends on (1) the degree distribution of input graph, (2) the type and duration of the application, and (3) the cluster size. Based on our results, we present rules of thumb to help users pick the best partitioning strategy for their particular use cases. We present results from each system, as well as from all partitioning strategies implemented in one common system (PowerLyra).

AB - In this paper, we study the problem of choosing among partitioning strategies in distributed graph processing systems. To this end, we evaluate and characterize both the performance and resource usage of different partitioning strategies under various popular distributed graph processing systems, applications, input graphs, and execution environments. Through our experiments, we found that no single partitioning strategy is the best fit for all situations, and that the choice of partitioning strategy has a significant effect on resource usage and application run-time. Our experiments demonstrate that the choice of partitioning strategy depends on (1) the degree distribution of input graph, (2) the type and duration of the application, and (3) the cluster size. Based on our results, we present rules of thumb to help users pick the best partitioning strategy for their particular use cases. We present results from each system, as well as from all partitioning strategies implemented in one common system (PowerLyra).

UR - http://www.scopus.com/inward/record.url?scp=85020390625&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85020390625&partnerID=8YFLogxK

U2 - 10.14778/3055540.3055543

DO - 10.14778/3055540.3055543

M3 - Conference article

AN - SCOPUS:85020390625

VL - 10

SP - 493

EP - 504

JO - Proceedings of the VLDB Endowment

JF - Proceedings of the VLDB Endowment

SN - 2150-8097

IS - 5

ER -