TY - JOUR
T1 - Is search really necessary to generate high-performance BLAS?
AU - Yotov, Kamen
AU - Li, Xiaoming
AU - Ren, Gang
AU - Garzarán, María Jesús
AU - Padua, David
AU - Pingali, Keshav
AU - Stodghill, Paul
N1 - Funding Information:
Manuscript received April 13, 2004; revised October 15, 2004. This work was supported by the National Science Foundation under Grants ACI-9870687, EIA-9972853, ACI-0085969, ACI-0090217, ACI-0103723, and ACI-012140.
PY - 2005/2
Y1 - 2005/2
N2 - A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and loop unrolling factors. Traditional compilers use simple analytical models to compute these values. In contrast, library generators like ATLAS use global search over the space of parameter values by generating programs with many different combinations of parameter values, and running them on the actual hardware to determine which values give the best performance. It is widely believed that traditional model-driven optimization cannot compete with search-based empirical optimization because tractable analytical models cannot capture all the complexities of modern high-performance architectures, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the global search engine in ATLAS with a model-driven optimization engine and measured the relative performance of the code produced by the two systems on a variety of architectures. Since both systems use the same code generator, any differences in the performance of the code produced by the two systems can come only from differences in optimization parameter values. Our experiments show that model-driven optimization can be surprisingly effective and can generate code with performance comparable to that of code generated by ATLAS using global search.
AB - A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and loop unrolling factors. Traditional compilers use simple analytical models to compute these values. In contrast, library generators like ATLAS use global search over the space of parameter values by generating programs with many different combinations of parameter values, and running them on the actual hardware to determine which values give the best performance. It is widely believed that traditional model-driven optimization cannot compete with search-based empirical optimization because tractable analytical models cannot capture all the complexities of modern high-performance architectures, but few quantitative comparisons have been done to date. To make such a comparison, we replaced the global search engine in ATLAS with a model-driven optimization engine and measured the relative performance of the code produced by the two systems on a variety of architectures. Since both systems use the same code generator, any differences in the performance of the code produced by the two systems can come only from differences in optimization parameter values. Our experiments show that model-driven optimization can be surprisingly effective and can generate code with performance comparable to that of code generated by ATLAS using global search.
KW - Basic Linear Algebra Subprograms (BLAS)
KW - Compilers
KW - Empirical optimization
KW - High-performance computing
KW - Library generators
KW - Model-driven optimization
KW - Program optimization
UR - http://www.scopus.com/inward/record.url?scp=20744459570&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=20744459570&partnerID=8YFLogxK
U2 - 10.1109/JPROC.2004.840444
DO - 10.1109/JPROC.2004.840444
M3 - Article
AN - SCOPUS:20744459570
VL - 93
SP - 358
EP - 385
JO - Proceedings of the Institute of Radio Engineers
JF - Proceedings of the Institute of Radio Engineers
SN - 0018-9219
IS - 2
ER -