Abstract
Parallel file systems (PFSes) and parallel I/O libraries have been the backbone of high-performance computing (HPC) infrastructures for decades. However, their crash consistency bugs have not been extensively studied, and the corresponding bug-finding or testing tools are lacking. In this paper, we first conduct a thorough bug study on the popular PFSes, such as BeeGFS and OrangeFS, with a cross-stack approach that covers HPC I/O library, PFS, and interactions with local file systems. The study results drive our design of a scalable testing framework, named PFSCHECK. PFSCHECK is easy to use with low performance overhead, as it can automatically generate test cases for triggering potential crash-consistency bugs, and trace essential file operations with low overhead. PFSCHECK is scalable for supporting large-scale HPC clusters, as it can exploit the parallelism to facilitate the verification of persistent storage states.
Original language | English (US) |
---|---|
State | Published - 2020 |
Event | 12th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2020, co-located withUSENIX ATC 2020 - Virtual, Online Duration: Jul 13 2020 → Jul 14 2020 |
Conference
Conference | 12th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage 2020, co-located withUSENIX ATC 2020 |
---|---|
City | Virtual, Online |
Period | 7/13/20 → 7/14/20 |
ASJC Scopus subject areas
- Software
- Computer Networks and Communications
- Hardware and Architecture
- Information Systems