TY - GEN
T1 - Accelerating Sparse Data Orchestration via Dynamic Reflexive Tiling
AU - Odemuyiwa, Toluwanimi O.
AU - Asghari-Moghaddam, Hadi
AU - Pellauer, Michael
AU - Hegde, Kartik
AU - Tsai, Po An
AU - Crago, Neal C.
AU - Jaleel, Aamer
AU - Owens, John D.
AU - Solomonik, Edgar
AU - Emer, Joel S.
AU - Fletcher, Christopher W.
N1 - The authors appreciate the financial support from a Microsoft Research Fellowship and a Facebook PhD Fellowship. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-18-3-0007, and the National Science Foundation (NSF) under Contract No. 1909999 and Contract No. 1942888. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the U.S. Government.
PY - 2023/3/25
Y1 - 2023/3/25
N2 - Tensor algebra involving multiple sparse operands is severely memory bound, making it a challenging target for acceleration. Furthermore, irregular sparsity complicates traditional techniques-such as tiling-for ameliorating memory bottlenecks. Prior sparse tiling schemes are sparsity unaware: they carve tensors into uniform coordinate-space shapes, which leads to low-occupancy tiles and thus lower exploitable reuse. To address these challenges, this paper proposes dynamic reflexive tiling (DRT), a novel tiling method that improves data reuse over prior art for sparse tensor kernels, unlocking significant performance improvement opportunities. DRT's key idea is dynamic sparsity-aware tiling. DRT continuously re-tiles sparse tensors at runtime based on the current sparsity of the active regions of all input tensors, to maximize accelerator buffer utilization while retaining the ability to co-iterate through tiles of distinct tensors. Through an extensive evaluation over a set of SuiteSparse matrices, we show how DRT can be applied to multiple prior accelerators with different dataflows (ExTensor, OuterSPACE, MatRaptor), improving their performance (by 3.3×, 5.1× and 1.6×, respectively) while adding negligible area overhead. We apply DRT to higher-order tensor kernels to reduce DRAM traffic by 3.9× and 16.9× over a CPU implementation and prior-art tiling scheme, respectively. Finally, we show that the technique is portable to software, with an improvement of 7.29× and 2.94× in memory overhead compared to untiled sparse-sparse matrix multiplication (SpMSpM).
AB - Tensor algebra involving multiple sparse operands is severely memory bound, making it a challenging target for acceleration. Furthermore, irregular sparsity complicates traditional techniques-such as tiling-for ameliorating memory bottlenecks. Prior sparse tiling schemes are sparsity unaware: they carve tensors into uniform coordinate-space shapes, which leads to low-occupancy tiles and thus lower exploitable reuse. To address these challenges, this paper proposes dynamic reflexive tiling (DRT), a novel tiling method that improves data reuse over prior art for sparse tensor kernels, unlocking significant performance improvement opportunities. DRT's key idea is dynamic sparsity-aware tiling. DRT continuously re-tiles sparse tensors at runtime based on the current sparsity of the active regions of all input tensors, to maximize accelerator buffer utilization while retaining the ability to co-iterate through tiles of distinct tensors. Through an extensive evaluation over a set of SuiteSparse matrices, we show how DRT can be applied to multiple prior accelerators with different dataflows (ExTensor, OuterSPACE, MatRaptor), improving their performance (by 3.3×, 5.1× and 1.6×, respectively) while adding negligible area overhead. We apply DRT to higher-order tensor kernels to reduce DRAM traffic by 3.9× and 16.9× over a CPU implementation and prior-art tiling scheme, respectively. Finally, we show that the technique is portable to software, with an improvement of 7.29× and 2.94× in memory overhead compared to untiled sparse-sparse matrix multiplication (SpMSpM).
KW - Hardware Acceleration
KW - Sparse Computation
KW - Tensor Algebra
UR - http://www.scopus.com/inward/record.url?scp=85159287901&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159287901&partnerID=8YFLogxK
U2 - 10.1145/3582016.3582064
DO - 10.1145/3582016.3582064
M3 - Conference contribution
AN - SCOPUS:85159287901
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 18
EP - 32
BT - ASPLOS 2023 - Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
A2 - Aamodt, Tor M.
A2 - Jerger, Natalie Enright
A2 - Swift, Michael
PB - Association for Computing Machinery
T2 - 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2023
Y2 - 25 March 2023 through 29 March 2023
ER -