TY - GEN
T1 - Can applications recover from fsync failures?
AU - Rebello, Anthony
AU - Patel, Yuvraj
AU - Alagappan, Ramnatthan
AU - Arpaci-Dusseau, Andrea C.
AU - Arpaci-Dusseau, Remzi H.
N1 - Funding Information:
We thank Peter Macko (our shepherd), the anonymous reviewers of ATC ’20, and the members of ADSL for their insightful comments and suggestions. We thank CloudLab [32] for providing a great environment to run our experiments. We also thank our sponsors: VMWare, NetApp, and Intel, for their generous support. This material was also supported by funding from NSF grants CNS-1421033, CNS-1763810 and CNS-1838733, and DOE grant DE-SC0014935. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF, DOE, or any other institutions.
Funding Information:
We thank Peter Macko (our shepherd), the anonymous reviewers of ATC'20, and the members of ADSL for their insightful comments and suggestions. We thank CloudLab [32] for providing a great environment to run our experiments. We also thank our sponsors: VMWare, NetApp, and Intel, for their generous support. This material was also supported by funding from NSF grants CNS-1421033, CNS-1763810 and CNS-1838733, and DOE grant DE-SC0014935. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of NSF, DOE, or any other institutions.
Publisher Copyright:
Copyright © Proc. of the 2020 USENIX Annual Technical Conference, ATC 2020. All rights reserved.
PY - 2020
Y1 - 2020
N2 - We analyze how file systems and modern data-intensive applications react to fsync failures. First, we characterize how three Linux file systems (ext4, XFS, Btrfs) behave in the presence of failures. We find commonalities across file systems (pages are always marked clean, certain block writes always lead to unavailability), as well as differences (page content and failure reporting is varied). Next, we study how five widely used applications (PostgreSQL, LMDB, LevelDB, SQLite, Redis) handle fsync failures. Our findings show that although applications use many failure-handling strategies, none are sufficient: fsync failures can cause catastrophic outcomes such as data loss and corruption. Our findings have strong implications for the design of file systems and applications that intend to provide strong durability guarantees.
AB - We analyze how file systems and modern data-intensive applications react to fsync failures. First, we characterize how three Linux file systems (ext4, XFS, Btrfs) behave in the presence of failures. We find commonalities across file systems (pages are always marked clean, certain block writes always lead to unavailability), as well as differences (page content and failure reporting is varied). Next, we study how five widely used applications (PostgreSQL, LMDB, LevelDB, SQLite, Redis) handle fsync failures. Our findings show that although applications use many failure-handling strategies, none are sufficient: fsync failures can cause catastrophic outcomes such as data loss and corruption. Our findings have strong implications for the design of file systems and applications that intend to provide strong durability guarantees.
UR - http://www.scopus.com/inward/record.url?scp=85091892839&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85091892839&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85091892839
SP - 753
EP - 767
BT - Proceedings of the 2020 USENIX Annual Technical Conference, ATC 2020
PB - USENIX Association
T2 - 2020 USENIX Annual Technical Conference, ATC 2020
Y2 - 15 July 2020 through 17 July 2020
ER -