Zorro: zero-cost reactive failure recovery in distributed graph processing

Mayank Pundir, Luke M. Leslie, Indranil Gupta, Roy H. Campbell

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Distributed graph processing systems largely rely on proactive techniques for failure recovery. Unfortunately, these approaches (such as checkpointing) entail a significant overhead. In this paper, we argue that distributed graph processing systems should instead use a reactive approach to failure recovery. The reactive approach trades off completeness of the result (generating a slightly inaccurate result) while reducing the overhead during failure-free execution to zero. We build a system called Zorro that imbues this reactive approach, and integrate Zorro into two graph processing systems - PowerGraph and LFGraph. When a failure occurs, Zorro opportunistically exploits vertex replication inherent in today's graph processing systems to quickly rebuild the state of failed servers. Experiments using real-world graphs demonstrate that Zorro is able to recover over 99% of the graph state when 6-12% of the servers fail, and between 87- 95% when half the cluster fails. Furthermore, using various graph processing algorithms, Zorro incurs little to no accuracy loss in all experimental failure scenarios, and achieves a worst-case accuracy of 97%.

Original languageEnglish (US)
Title of host publicationACM SoCC 2015 - Proceedings of the 6th ACM Symposium on Cloud Computing
EditorsMagdalena Balazinska, Michael J. Freedman, Sumita Barahmand, Shahram Ghandeharizadeh
PublisherAssociation for Computing Machinery, Inc
Pages195-208
Number of pages14
ISBN (Electronic)9781450336512
DOIs
StatePublished - Aug 27 2015
Event6th ACM Symposium on Cloud Computing, ACM SoCC 2015 - Kohala Coast, United States
Duration: Aug 27 2015Aug 29 2015

Publication series

NameACM SoCC 2015 - Proceedings of the 6th ACM Symposium on Cloud Computing

Other

Other6th ACM Symposium on Cloud Computing, ACM SoCC 2015
CountryUnited States
CityKohala Coast
Period8/27/158/29/15

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Theoretical Computer Science

Fingerprint Dive into the research topics of 'Zorro: zero-cost reactive failure recovery in distributed graph processing'. Together they form a unique fingerprint.

  • Cite this

    Pundir, M., Leslie, L. M., Gupta, I., & Campbell, R. H. (2015). Zorro: zero-cost reactive failure recovery in distributed graph processing. In M. Balazinska, M. J. Freedman, S. Barahmand, & S. Ghandeharizadeh (Eds.), ACM SoCC 2015 - Proceedings of the 6th ACM Symposium on Cloud Computing (pp. 195-208). (ACM SoCC 2015 - Proceedings of the 6th ACM Symposium on Cloud Computing). Association for Computing Machinery, Inc. https://doi.org/10.1145/2806777.2806934