TY - GEN
T1 - Protocol-aware recovery for consensus-based storage
AU - Alagappan, Ramnatthan
AU - Ganesan, Aishwarya
AU - Lee, Eric
AU - Albarghouthi, Aws
AU - Chidambaram, Vijay
AU - Arpaci-Dusseau, Andrea C.
AU - Arpaci-Dusseau, Remzi H.
N1 - We thank Mahesh Balakrishnan (our shepherd), the anonymous reviewers, and the members of ADSL for their excellent feedback. We also thank CloudLab [56] for providing a great environment to run our experiments. This material was supported by funding from NSF grants CNS-1421033 and CNS-1218405, DOE grant DE-SC0014935, and donations from EMC, Huawei, Microsoft, and VMware. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and may not reflect the views of NSF, DOE, or other institutions.
We thank Mahesh Balakrishnan (our shepherd), the anonymous reviewers, and the members of ADSL for their excellent feedback. We also thank Cloud-Lab [56] for providing a great environment to run our experiments. This material was supported by funding from NSF grants CNS-1421033 and CNS-1218405, DOE grant DE-SC0014935, and donations from EMC, Huawei, Microsoft, and VMware. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and may not reflect the views of NSF, DOE, or other institutions.
PY - 2018
Y1 - 2018
N2 - We introduce protocol-aware recovery (PAR), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of PAR through the design and implementation of corruption-tolerant replication (CTRL), a PAR mechanism specific to replicated state machine (RSM) systems. We experimentally show that the CTRL versions of two systems, LogCabin and ZooKeeper, safely recover from storage faults and provide high availability, while the unmodified versions can lose data or become unavailable. We also show that the CTRL versions have little performance overhead.
AB - We introduce protocol-aware recovery (PAR), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of PAR through the design and implementation of corruption-tolerant replication (CTRL), a PAR mechanism specific to replicated state machine (RSM) systems. We experimentally show that the CTRL versions of two systems, LogCabin and ZooKeeper, safely recover from storage faults and provide high availability, while the unmodified versions can lose data or become unavailable. We also show that the CTRL versions have little performance overhead.
UR - https://www.scopus.com/pages/publications/85053111378
UR - https://www.scopus.com/pages/publications/85053111378#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:85053111378
SP - 15
EP - 31
BT - Proceedings of the 16th USENIX Conference on File and Storage Technologies, FAST 2018
PB - USENIX Association
T2 - 16th USENIX Conference on File and Storage Technologies, FAST 2018
Y2 - 12 February 2018 through 15 February 2018
ER -