TY - GEN
T1 - Optimization by runtime specialization for sparse matrix-vector multiplication
AU - Kamin, Sam
AU - Garzarán, María Jesús
AU - Aktemur, Bariş
AU - Xu, Danqing
AU - Yilmaz, Buse
AU - Chen, Zhongbo
PY - 2014/9/15
Y1 - 2014/9/15
N2 - Runtime specialization optimizes programs based on partial information available only at run time. It is applicable when some input data is used repeatedly while other input data varies. This technique has the potential of generating highly efficient codes. In this paper, we explore the potential for obtaining speedups for sparse matrix-dense vector multiplication using runtime specialization, in the case where a single matrix is to be multiplied by many vectors. We experiment with five methods involving runtime specialization, comparing them to methods that do not (including Intel's MKL library). For this work, our focus is the evaluation of the speedups that can be obtained with runtime specialization without considering the overheads of the code generation. Our experiments use 23 matrices from the Matrix Market and Florida collections, and run on five different machines. In 94 of those 115 cases, the specialized code runs faster than any version without specialization. If we only use specialization, the average speedup with respect to Intel's MKL library ranges from 1.44x to 1.77x, depending on the machine. We have also found that the best method depends on the matrix and machine; no method is best for all matrices and machines. Copyright is held by the author/owner(s).
AB - Runtime specialization optimizes programs based on partial information available only at run time. It is applicable when some input data is used repeatedly while other input data varies. This technique has the potential of generating highly efficient codes. In this paper, we explore the potential for obtaining speedups for sparse matrix-dense vector multiplication using runtime specialization, in the case where a single matrix is to be multiplied by many vectors. We experiment with five methods involving runtime specialization, comparing them to methods that do not (including Intel's MKL library). For this work, our focus is the evaluation of the speedups that can be obtained with runtime specialization without considering the overheads of the code generation. Our experiments use 23 matrices from the Matrix Market and Florida collections, and run on five different machines. In 94 of those 115 cases, the specialized code runs faster than any version without specialization. If we only use specialization, the average speedup with respect to Intel's MKL library ranges from 1.44x to 1.77x, depending on the machine. We have also found that the best method depends on the matrix and machine; no method is best for all matrices and machines. Copyright is held by the author/owner(s).
KW - Performance evaluation
KW - Program specialization
KW - Sparse matrix-vector multiplication
UR - http://www.scopus.com/inward/record.url?scp=84939524573&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84939524573&partnerID=8YFLogxK
U2 - 10.1145/2658761.2658773
DO - 10.1145/2658761.2658773
M3 - Conference contribution
AN - SCOPUS:84939524573
T3 - 13th International Conference on Generative Programming: Concepts and Experiences, GPCE 2014 - Proceedings
SP - 93
EP - 102
BT - 13th International Conference on Generative Programming
PB - Association for Computing Machinery
T2 - 13th International Conference on Generative Programming: Concepts and Experiences, GPCE 2014
Y2 - 15 September 2014 through 16 September 2014
ER -