Redundancy does not imply fault tolerance: Analysis of distributed storage reactions to single errors and corruptions

Aishwarya Ganesan, Ramnatthan Alagappan, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous bugs related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. Our results have implications for the design of next generation fault-tolerant distributed and cloud storage systems.

Original languageEnglish (US)
Title of host publicationProceedings of the 15th USENIX Conference on File and Storage Technologies, FAST 2017
PublisherUSENIX Association
Pages149-165
Number of pages17
ISBN (Electronic)9781931971362
StatePublished - 2017
Externally publishedYes
Event15th USENIX Conference on File and Storage Technologies, FAST 2017 - Santa Clara, United States
Duration: Feb 27 2017Mar 2 2017

Conference

Conference15th USENIX Conference on File and Storage Technologies, FAST 2017
Country/TerritoryUnited States
CitySanta Clara
Period2/27/173/2/17

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'Redundancy does not imply fault tolerance: Analysis of distributed storage reactions to single errors and corruptions'. Together they form a unique fingerprint.

Cite this