TY - JOUR
T1 - TUCKET
T2 - 51st International Conference on Very Large Data Bases, VLDB 2025
AU - Qiu, Ruizhong
AU - Jang, Jun Gi
AU - Lin, Xiao
AU - Liu, Lihui
AU - Tong, Hanghang
N1 - This work was supported by NSF (2316233), NIFA (2020-67021-32799), and IBM\u2013Illinois Discovery Accelerator Institute. The content of the information in this document does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on. Jun-Gi Jang was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (RS-2023-00238596).
PY - 2024
Y1 - 2024
N2 - Given an evolving tensor time series and multiple time ranges, how can we compute Tucker decomposition for each time range efficiently and accurately? Tucker decomposition has been widely used in a variety of applications to obtain latent factors of tensor data. For example, Tucker decomposition on air pollution data allows us to analyze and compare air pollution patterns between different locations during different periods of time. In these applications, a common need is to compute Tucker decomposition for a given time range. Furthermore, real-world tensor time series are typically evolving in the time dimension. Such needs call for a data structure that can efficiently and accurately support range queries of Tucker decomposition and stream updates. Unfortunately, existing methods do not support either range queries or stream updates. For methods that do not support range queries, they have to re-compute from scratch for each query. Not until 2021 has a data structure called Zoom-Tucker been proposed to support range queries via block-wise preprocessing. However, Zoom-Tucker does not support stream updates and, more critically, suffers from a reluctant efficiency–accuracy tradeoff — a large block size causes inaccuracy, while a small block size leads to inefficiency. This challenging problem has remained open for years prior to our work. To solve this challenging problem, we propose TUCKET, a data structure that can efficiently and accurately handle both range queries and stream updates. Our key idea is to design a new data structure that we call a stream segment tree by generalizing the segment tree, a data structure that was originally invented for computational geometry. For a range query of length L, our TUCKET can find O(log L) nodes (called the hit set) from the tree and efficiently stitch their preprocessed decompositions to answer the range query. We also propose an algorithm to optimally prune the hit set via an approximation of subtensor decomposition. For the T-th stream update, our TUCKET modifies only amortized O(1) nodes and only O(logT) nodes in the worst case. Extensive evaluation demonstrates that our TUCKET consistently achieves the highest efficiency and accuracy across four large-scale datasets. Our TUCKET achieves at least 3 times lower latency and at least 1.4 times smaller reconstruction error than Zoom-Tucker on all datasets. The full version can be found at https://github.com/q-rz/TUCKET/blob/main/TUCKET-Full.pdf.
AB - Given an evolving tensor time series and multiple time ranges, how can we compute Tucker decomposition for each time range efficiently and accurately? Tucker decomposition has been widely used in a variety of applications to obtain latent factors of tensor data. For example, Tucker decomposition on air pollution data allows us to analyze and compare air pollution patterns between different locations during different periods of time. In these applications, a common need is to compute Tucker decomposition for a given time range. Furthermore, real-world tensor time series are typically evolving in the time dimension. Such needs call for a data structure that can efficiently and accurately support range queries of Tucker decomposition and stream updates. Unfortunately, existing methods do not support either range queries or stream updates. For methods that do not support range queries, they have to re-compute from scratch for each query. Not until 2021 has a data structure called Zoom-Tucker been proposed to support range queries via block-wise preprocessing. However, Zoom-Tucker does not support stream updates and, more critically, suffers from a reluctant efficiency–accuracy tradeoff — a large block size causes inaccuracy, while a small block size leads to inefficiency. This challenging problem has remained open for years prior to our work. To solve this challenging problem, we propose TUCKET, a data structure that can efficiently and accurately handle both range queries and stream updates. Our key idea is to design a new data structure that we call a stream segment tree by generalizing the segment tree, a data structure that was originally invented for computational geometry. For a range query of length L, our TUCKET can find O(log L) nodes (called the hit set) from the tree and efficiently stitch their preprocessed decompositions to answer the range query. We also propose an algorithm to optimally prune the hit set via an approximation of subtensor decomposition. For the T-th stream update, our TUCKET modifies only amortized O(1) nodes and only O(logT) nodes in the worst case. Extensive evaluation demonstrates that our TUCKET consistently achieves the highest efficiency and accuracy across four large-scale datasets. Our TUCKET achieves at least 3 times lower latency and at least 1.4 times smaller reconstruction error than Zoom-Tucker on all datasets. The full version can be found at https://github.com/q-rz/TUCKET/blob/main/TUCKET-Full.pdf.
UR - http://www.scopus.com/inward/record.url?scp=85217077887&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85217077887&partnerID=8YFLogxK
U2 - 10.14778/3704965.3704980
DO - 10.14778/3704965.3704980
M3 - Conference article
AN - SCOPUS:85217077887
SN - 2150-8097
VL - 17
SP - 4746
EP - 4759
JO - Proceedings of the VLDB Endowment
JF - Proceedings of the VLDB Endowment
IS - 13
Y2 - 1 September 2025 through 5 September 2025
ER -