Abstract
We introduce protocol-aware recovery (PAR), a new approach that exploits protocol-specific knowledge to correctly recover from storage faults in distributed systems. We demonstrate the efficacy of PAR through the design and implementation of corruption-tolerant replication (CTRL), a PAR mechanism specific to replicated state machine (RSM) systems. We experimentally show that the CTRL versions of two systems, LogCabin and ZooKeeper, safely recover from storage faults and provide high availability, while the unmodified versions can lose data or become unavailable. We also show that the CTRL versions have little performance overhead.
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 16th USENIX Conference on File and Storage Technologies, FAST 2018 |
Publisher | USENIX Association |
Pages | 15-31 |
Number of pages | 17 |
ISBN (Electronic) | 9781931971423 |
State | Published - 2018 |
Externally published | Yes |
Event | 16th USENIX Conference on File and Storage Technologies, FAST 2018 - Oakland, United States Duration: Feb 12 2018 → Feb 15 2018 |
Conference
Conference | 16th USENIX Conference on File and Storage Technologies, FAST 2018 |
---|---|
Country/Territory | United States |
City | Oakland |
Period | 2/12/18 → 2/15/18 |
ASJC Scopus subject areas
- Hardware and Architecture
- Software
- Computer Networks and Communications