TY - JOUR
T1 - Toward Quantity-of-Interest Preserving Lossy Compression for Scientific Data
AU - Jiao, Pu
AU - Di, Sheng
AU - Guo, Hanqi
AU - Zhao, Kai
AU - Tian, Jiannan
AU - Tao, Dingwen
AU - Liang, Xin
AU - Cappello, Franck
N1 - This work was supported by the National Science Foundation under Grants OAC-2003709, OAC-2042084/2303064, OAC-2104023, and OAC-2153451. The material was supported by the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357. This research was also supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nu-clear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. This work used the Foundry cluster that was supported by the National Science Foundation under Grant OAC-1919789.
This work was supported by the National Science Foundation under Grants OAC-2003709, OAC-2042084/2303064, OAC-2104023, and OAC-2153451. The material was supported by the U.S. Department of Energy, Office of Science and Office of Advanced Scientific Computing Research (ASCR), under contract DE-AC02-06CH11357. This research was also supported by the Exascale Computing Project (ECP), Project Number: 17-SC-20-SC, a collaborative effort of two DOE organizations – the Office of Science and the National Nuclear Security Administration, responsible for the planning and preparation of a capable exascale ecosystem, including software, applications, hardware, advanced system engineering and early testbed platforms, to support the nation’s exascale computing imperative. This work used the Foundry cluster that was supported by the National Science Foundation under Grant OAC-1919789.
PY - 2022
Y1 - 2022
N2 - Today’s scientific simulations and instruments are producing a large amount of data, leading to difficulties in storing, transmitting, and analyzing these data. While error-controlled lossy compressors are effective in significantly reducing data volumes and efficiently developing databases for multiple scientific applications, they mainly support error controls on raw data, which leaves a significant gap between the data and user’s downstream analysis. This may cause unqualified uncertainties in the outcomes of the analysis, a.k.a quantities of interest (QoIs), which are the major concerns of users in adopting lossy compression in practice. In this paper, we propose rigorous mathematical theories to preserve four families of QoIs that are widely used in scientific analysis during lossy compression along with practical implementations. Specifically, we first develop the error control theory for univariate QoIs which are essential for computing physical properties such as kinetic energy, followed by multivariate QoIs that are more commonly used in real-world applications. The proposed method is integrated into a state-of-the-art compression framework in a modular fashion, which could easily adapt to new QoIs and new compression algorithms. Experiments on real-world datasets demonstrate that the proposed method provides faithful error control on important QoIs including kinetic energy, regional average, and isosurface without trials and errors, while offering compression ratios that are up to 4× of the compression ratios provided by state-of-the-art compressors.
AB - Today’s scientific simulations and instruments are producing a large amount of data, leading to difficulties in storing, transmitting, and analyzing these data. While error-controlled lossy compressors are effective in significantly reducing data volumes and efficiently developing databases for multiple scientific applications, they mainly support error controls on raw data, which leaves a significant gap between the data and user’s downstream analysis. This may cause unqualified uncertainties in the outcomes of the analysis, a.k.a quantities of interest (QoIs), which are the major concerns of users in adopting lossy compression in practice. In this paper, we propose rigorous mathematical theories to preserve four families of QoIs that are widely used in scientific analysis during lossy compression along with practical implementations. Specifically, we first develop the error control theory for univariate QoIs which are essential for computing physical properties such as kinetic energy, followed by multivariate QoIs that are more commonly used in real-world applications. The proposed method is integrated into a state-of-the-art compression framework in a modular fashion, which could easily adapt to new QoIs and new compression algorithms. Experiments on real-world datasets demonstrate that the proposed method provides faithful error control on important QoIs including kinetic energy, regional average, and isosurface without trials and errors, while offering compression ratios that are up to 4× of the compression ratios provided by state-of-the-art compressors.
UR - http://www.scopus.com/inward/record.url?scp=85146262739&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146262739&partnerID=8YFLogxK
U2 - 10.14778/3574245.3574255
DO - 10.14778/3574245.3574255
M3 - Article
AN - SCOPUS:85146262739
SN - 2150-8097
VL - 16
SP - 697
EP - 710
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 4
ER -