TY - GEN
T1 - Optimizing Scientific Data Transfer on Globus with Error-Bounded Lossy Compression
AU - Liu, Yuanjian
AU - Di, Sheng
AU - Chard, Kyle
AU - Foster, Ian
AU - Cappello, Franck
N1 - The material was supported by the U.S. Department of Energy, Office of Science, Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357,and supported by the National Science Foundation under Grant OAC-2003709and OAC-2104023.We acknowledgethe computing resources provided on Bebop (operated by Laboratory Computing Resource Center at Argonne).
PY - 2023
Y1 - 2023
N2 - The increasing volume and velocity of science data necessitate the frequent movement of enormous data volumes as part of routine research activities. As a result, limited wide-area bandwidth often leads to bottlenecks in research progress. However, in many cases, consuming applications (e.g., for analysis, visualization, and machine learning) can achieve acceptable performance on reduced-precision data, and thus researchers may wish to compromise on data precision to reduce transfer and storage costs. Error-bounded lossy compression presents a promising approach as it can significantly reduce data volumes while preserving data integrity based on user-specified error bounds. In this paper, we propose a novel data transfer framework called Ocelot that integrates error-bounded lossy compression into the Globus data transfer infrastructure. We note four key contributions: (1) Ocelot is the first integration of lossy compression in Globus to significantly improve scientific data transfer performance over wide area network (WAN). (2) We propose an effective machine-learning based lossy compression quality estimation model that can predict the quality of error-bounded lossy compressors, which is fundamental to ensure that transferred data are acceptable to users. (3) We develop optimized strategies to reduce the compression time overhead, counter the compute-node waiting time, and improve transfer speed for compressed files. (4) We perform evaluations using many real-world scientific applications across different domains and distributed Globus endpoints. Our experiments show that Ocelot can improve dataset transfer performance substantially, and the quality of lossy compression (time, ratio and data distortion) can be predicted accurately for the purpose of quality assurance.
AB - The increasing volume and velocity of science data necessitate the frequent movement of enormous data volumes as part of routine research activities. As a result, limited wide-area bandwidth often leads to bottlenecks in research progress. However, in many cases, consuming applications (e.g., for analysis, visualization, and machine learning) can achieve acceptable performance on reduced-precision data, and thus researchers may wish to compromise on data precision to reduce transfer and storage costs. Error-bounded lossy compression presents a promising approach as it can significantly reduce data volumes while preserving data integrity based on user-specified error bounds. In this paper, we propose a novel data transfer framework called Ocelot that integrates error-bounded lossy compression into the Globus data transfer infrastructure. We note four key contributions: (1) Ocelot is the first integration of lossy compression in Globus to significantly improve scientific data transfer performance over wide area network (WAN). (2) We propose an effective machine-learning based lossy compression quality estimation model that can predict the quality of error-bounded lossy compressors, which is fundamental to ensure that transferred data are acceptable to users. (3) We develop optimized strategies to reduce the compression time overhead, counter the compute-node waiting time, and improve transfer speed for compressed files. (4) We perform evaluations using many real-world scientific applications across different domains and distributed Globus endpoints. Our experiments show that Ocelot can improve dataset transfer performance substantially, and the quality of lossy compression (time, ratio and data distortion) can be predicted accurately for the purpose of quality assurance.
KW - Data Transfer
KW - Globus
KW - Lossy Compression
KW - Performance
KW - WAN
UR - http://www.scopus.com/inward/record.url?scp=85175017409&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85175017409&partnerID=8YFLogxK
U2 - 10.1109/ICDCS57875.2023.00064
DO - 10.1109/ICDCS57875.2023.00064
M3 - Conference contribution
AN - SCOPUS:85175017409
T3 - Proceedings - International Conference on Distributed Computing Systems
SP - 703
EP - 713
BT - Proceedings - 2023 IEEE 43rd International Conference on Distributed Computing Systems, ICDCS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 43rd IEEE International Conference on Distributed Computing Systems, ICDCS 2023
Y2 - 18 July 2023 through 21 July 2023
ER -