TY - GEN
T1 - HotTiles
T2 - 30th IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
AU - Gerogiannis, Gerasimos
AU - Aananthakrishnan, Sriram
AU - Torrellas, Josep
AU - Hur, Ibrahim
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application across a wide range of domains, including machine learning and linear algebra solvers. In many sparse matrices, the pattern of nonzeros is nonuniform: nonzeros form dense and sparse regions, rather than being uniformly distributed across the whole matrix. We refer to this property as Intra-Matrix Heterogeneity (IMH). Currently, SpMM accelerator designs do not leverage this heterogeneity. They employ the same processing elements (PEs) for all the regions of a sparse matrix, resulting in suboptimal acceleration. To address this limitation, we utilize heterogeneous SpMM accelerator architectures, which include different types of PEs to exploit IMH. We develop an analytical modeling framework to predict the performance of different types of accelerator PEs taking into account IMH. Furthermore, we present a heuristic for partitioning sparse matrices among heterogeneous PEs. We call our matrix modeling and partitioning method HotTiles. To evaluate HotTiles, we simulate three different heterogeneous architectures. Each one consists of two types of workers (i.e., PEs): one suited for compute-bound denser regions (Hot Worker) and one for memory-bound sparser regions (Cold Worker). Our results show that exploiting IMH with HotTiles is very effective. Depending on the architecture, heterogeneous execution with HotTiles outperforms homogeneous execution using only hot or only cold workers by 9.2-16.8x and 1.4-3.7x, respectively. In addition, HotTiles outperforms the best worker type used on a per-matrix basis by 1.3-2.5 x. Finally, HotTiles outperforms an IMH-unaware heterogeneous execution strategy by 1.4-2.2x.
AB - Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application across a wide range of domains, including machine learning and linear algebra solvers. In many sparse matrices, the pattern of nonzeros is nonuniform: nonzeros form dense and sparse regions, rather than being uniformly distributed across the whole matrix. We refer to this property as Intra-Matrix Heterogeneity (IMH). Currently, SpMM accelerator designs do not leverage this heterogeneity. They employ the same processing elements (PEs) for all the regions of a sparse matrix, resulting in suboptimal acceleration. To address this limitation, we utilize heterogeneous SpMM accelerator architectures, which include different types of PEs to exploit IMH. We develop an analytical modeling framework to predict the performance of different types of accelerator PEs taking into account IMH. Furthermore, we present a heuristic for partitioning sparse matrices among heterogeneous PEs. We call our matrix modeling and partitioning method HotTiles. To evaluate HotTiles, we simulate three different heterogeneous architectures. Each one consists of two types of workers (i.e., PEs): one suited for compute-bound denser regions (Hot Worker) and one for memory-bound sparser regions (Cold Worker). Our results show that exploiting IMH with HotTiles is very effective. Depending on the architecture, heterogeneous execution with HotTiles outperforms homogeneous execution using only hot or only cold workers by 9.2-16.8x and 1.4-3.7x, respectively. In addition, HotTiles outperforms the best worker type used on a per-matrix basis by 1.3-2.5 x. Finally, HotTiles outperforms an IMH-unaware heterogeneous execution strategy by 1.4-2.2x.
KW - hardware accelerators
KW - heterogeneous computing
KW - sparse computations
KW - SpMM
UR - http://www.scopus.com/inward/record.url?scp=85190233780&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85190233780&partnerID=8YFLogxK
U2 - 10.1109/HPCA57654.2024.00081
DO - 10.1109/HPCA57654.2024.00081
M3 - Conference contribution
AN - SCOPUS:85190233780
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 1012
EP - 1028
BT - Proceedings - 2024 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2024
PB - IEEE Computer Society
Y2 - 2 March 2024 through 6 March 2024
ER -