TY - GEN
T1 - Fine-grained Policy-driven I/O Sharing for Burst Buffers
AU - Karrels, Ed
AU - Huang, Lei
AU - Kan, Yuhong
AU - Arora, Ishank
AU - Wang, Yinzhi
AU - Katz, Daniel S.
AU - Gropp, William D.
AU - Zhang, Zhao
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/11/12
Y1 - 2023/11/12
N2 - A burst buffer is a common method to bridge the performance gap between the I/O needs of modern supercomputing applications and the performance of the shared file system on large-scale supercomputers. However, existing I/O sharing methods require resource isolation, offline profiling, or repeated execution that significantly limit the utilization and applicability of these systems. Here we present ThemisIO, a policy-driven I/O sharing framework for a remote-shared burst buffer: a dedicated group of I/O nodes, each with a local storage device. ThemisIO preserves high utilization by implementing opportunity fairness so that it can reallocate unused I/O resources to other applications. ThemisIO accurately and efficiently allocates I/O cycles among applications, purely based on real-time I/O behavior without requiring user-supplied information or offline-profiled application characteristics. ThemisIO supports a variety of fair sharing policies, such as user-fair, size-fair, as well as composite policies, e.g., group-then-user-fair. All these features are enabled by its statistical token design. ThemisIO can alter the execution order of incoming I/O requests based on assigned tokens to precisely balance I/O cycles between applications via time slicing, thereby enforcing processing isolation. Experiments using I/O benchmarks show that ThemisIO sustains 13.5-13.7% higher I/O throughput and 19.5-40.4% lower performance variation than existing algorithms. For real applications, ThemisIO significantly reduces the slowdown by 59.1-99.8% caused by I/O interference.
AB - A burst buffer is a common method to bridge the performance gap between the I/O needs of modern supercomputing applications and the performance of the shared file system on large-scale supercomputers. However, existing I/O sharing methods require resource isolation, offline profiling, or repeated execution that significantly limit the utilization and applicability of these systems. Here we present ThemisIO, a policy-driven I/O sharing framework for a remote-shared burst buffer: a dedicated group of I/O nodes, each with a local storage device. ThemisIO preserves high utilization by implementing opportunity fairness so that it can reallocate unused I/O resources to other applications. ThemisIO accurately and efficiently allocates I/O cycles among applications, purely based on real-time I/O behavior without requiring user-supplied information or offline-profiled application characteristics. ThemisIO supports a variety of fair sharing policies, such as user-fair, size-fair, as well as composite policies, e.g., group-then-user-fair. All these features are enabled by its statistical token design. ThemisIO can alter the execution order of incoming I/O requests based on assigned tokens to precisely balance I/O cycles between applications via time slicing, thereby enforcing processing isolation. Experiments using I/O benchmarks show that ThemisIO sustains 13.5-13.7% higher I/O throughput and 19.5-40.4% lower performance variation than existing algorithms. For real applications, ThemisIO significantly reduces the slowdown by 59.1-99.8% caused by I/O interference.
UR - http://www.scopus.com/inward/record.url?scp=85190412701&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190412701&partnerID=8YFLogxK
U2 - 10.1145/3581784.3607041
DO - 10.1145/3581784.3607041
M3 - Conference contribution
AN - SCOPUS:85190412701
T3 - International Conference for High Performance Computing, Networking, Storage and Analysis, SC
BT - SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
PB - Association for Computing Machinery
T2 - 2023 International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2023
Y2 - 12 November 2023 through 17 November 2023
ER -