Protocol-aware recovery for consensus-based distributed storage

Ramnatthan Alagappan, Aishwarya Ganesan, Eric Lee, Aws Albarghouthi, Vijay Chidambaram, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

Research output: Contribution to journalArticlepeer-review

Abstract

We introduce protocol-aware recovery (Par), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the eficacy of Par through the design and implementation of corruption-tolerant replication (Ctrl), a Par mechanism specific to replicated state machine (RSM) systems. We experimentally show that the Ctrl versions of two systems, LogCabin and ZooKeeper, safely recover from storage faults and provide high availability, while the unmodified versions can lose data or become unavailable. We also show that the Ctrl versions achieve this reliability with little performance overheads.

Original languageEnglish (US)
Article number21
JournalACM Transactions on Storage
Volume14
Issue number3
DOIs
StatePublished - Nov 2018
Externally publishedYes

Keywords

  • Consensus
  • Data corruption
  • Fault tolerance
  • Storage faults

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Protocol-aware recovery for consensus-based distributed storage'. Together they form a unique fingerprint.

Cite this