TY - GEN
T1 - Optimizing matrix multiplication with a classifier learning system
AU - Li, Xiaoming
AU - Garzarán, María Jesús
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2006
Y1 - 2006
N2 - Compilers have been very successful on automating the process of program optimization, but there is still a significant difference in performance between the code generated by the compiler and the hand-optimized code. Library generators such as ATLAS, SPIRAL, and FFTW address this problem by using empirical search to find the parameter values of certain optimization such as degree of unroll. We have recently developed a generator of sorting routines. Sorting differs from the algorithms implemented by other library generators in that performance of sorting depends not only on the target platform but also on the characteristics of the input data. In our work we used a classifier learning system to generate sorting routines that are capable of adapting to the input data. In this paper we follow a similar approach and use a classifier learning system to generate high performance libraries for matrix-matrix multiplication. Our library generator produces matrix multiplication routines that use recursive layouts and several levels of tiling. Our approach is to use a classifier learning system to search in the space of the different ways to partition the input matrices the one that performs the best. As a result, our system will determine the number of levels of tiling and tile size for each level depending on the target platform and the dimensions of the input matrices.
AB - Compilers have been very successful on automating the process of program optimization, but there is still a significant difference in performance between the code generated by the compiler and the hand-optimized code. Library generators such as ATLAS, SPIRAL, and FFTW address this problem by using empirical search to find the parameter values of certain optimization such as degree of unroll. We have recently developed a generator of sorting routines. Sorting differs from the algorithms implemented by other library generators in that performance of sorting depends not only on the target platform but also on the characteristics of the input data. In our work we used a classifier learning system to generate sorting routines that are capable of adapting to the input data. In this paper we follow a similar approach and use a classifier learning system to generate high performance libraries for matrix-matrix multiplication. Our library generator produces matrix multiplication routines that use recursive layouts and several levels of tiling. Our approach is to use a classifier learning system to search in the space of the different ways to partition the input matrices the one that performs the best. As a result, our system will determine the number of levels of tiling and tile size for each level depending on the target platform and the dimensions of the input matrices.
UR - http://www.scopus.com/inward/record.url?scp=43949113285&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=43949113285&partnerID=8YFLogxK
U2 - 10.1007/978-3-540-69330-7_9
DO - 10.1007/978-3-540-69330-7_9
M3 - Conference contribution
AN - SCOPUS:43949113285
SN - 3540693297
SN - 9783540693291
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 121
EP - 135
BT - Languages and Compilers for Parallel Computing - 18th International Workshop, LCPC 2005, Revised Selected Papers
T2 - 18th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2005
Y2 - 20 October 2005 through 22 October 2005
ER -