TY - GEN
T1 - HiCOO
T2 - 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
AU - Li, Jiajia
AU - Sun, Jimeng
AU - Vuduc, Richard
N1 - We thank Tamara Kolda, Bora Uc¸ar, and Shaden Smith for their constructive feedback. This research is supported by the U.S. National Science Foundation (NSF) Award Number 1533768, 2017–2018 IBM Ph.D. Fellowship Award, and the Laboratory Directed Research and Development program at Sandia National Laboratories, a multi-mission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Hon-eywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA0003525.
PY - 2019/3/11
Y1 - 2019/3/11
N2 - This paper proposes a new storage format for sparse tensors, called Hierarchical COOrdinate (HiCOO; pronounced: 'haiku'). It derives from coordinate (COO) format, arguably the de facto standard for general sparse tensor storage. HiCOO improves upon COO by compressing the indices in units of sparse tensor blocks, with the goals of preserving the 'mode-agnostic' simplicity of COO while reducing the bytes needed to represent the tensor and promoting data locality. We evaluate HiCOO by implementing a single-node, multicore-parallel version of the matricized tensor-times-Khatri-Rao product (MTTKRP) operation, which is the most expensive computational core in the widely used CANDECOMP/PARAFAC decomposition (CPD) algorithm. This MTTKRP implementation achieves up to 23.0× (6.8× on average) speedup over COO format and up to 15.6× (3.1× on average) speedup over another state-of-the-art format, compressed sparse fiber (CSF), by using less or comparable storage of them. When used within CPD, we also observe speedups against COO- and CSF-based implementations.
AB - This paper proposes a new storage format for sparse tensors, called Hierarchical COOrdinate (HiCOO; pronounced: 'haiku'). It derives from coordinate (COO) format, arguably the de facto standard for general sparse tensor storage. HiCOO improves upon COO by compressing the indices in units of sparse tensor blocks, with the goals of preserving the 'mode-agnostic' simplicity of COO while reducing the bytes needed to represent the tensor and promoting data locality. We evaluate HiCOO by implementing a single-node, multicore-parallel version of the matricized tensor-times-Khatri-Rao product (MTTKRP) operation, which is the most expensive computational core in the widely used CANDECOMP/PARAFAC decomposition (CPD) algorithm. This MTTKRP implementation achieves up to 23.0× (6.8× on average) speedup over COO format and up to 15.6× (3.1× on average) speedup over another state-of-the-art format, compressed sparse fiber (CSF), by using less or comparable storage of them. When used within CPD, we also observe speedups against COO- and CSF-based implementations.
UR - https://www.scopus.com/pages/publications/85064118745
UR - https://www.scopus.com/pages/publications/85064118745#tab=citedBy
U2 - 10.1109/SC.2018.00022
DO - 10.1109/SC.2018.00022
M3 - Conference contribution
AN - SCOPUS:85064118745
T3 - Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
SP - 238
EP - 252
BT - Proceedings - International Conference for High Performance Computing, Networking, Storage, and Analysis, SC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 11 November 2018 through 16 November 2018
ER -