TY - GEN
T1 - Towards an achievable performance for the loop nests
AU - Shivam, Aniket
AU - Watkinson, Neftali
AU - Nicolau, Alexandru
AU - Padua, David
AU - Veidenbaum, Alexander V.
N1 - Funding Information:
This work was supported by NSF award XPS 1533926.
Publisher Copyright:
© 2019, Springer Nature Switzerland AG.
PY - 2019
Y1 - 2019
N2 - Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.
AB - Numerous code optimization techniques, including loop nest optimizations, have been developed over the last four decades. Loop optimization techniques transform loop nests to improve the performance of the code on a target architecture, including exposing parallelism. Finding and evaluating an optimal, semantic-preserving sequence of transformations is a complex problem. The sequence is guided using heuristics and/or analytical models and there is no way of knowing how close it gets to optimal performance or if there is any headroom for improvement. This paper makes two contributions. First, it uses a comparative analysis of loop optimizations/transformations across multiple compilers to determine how much headroom may exist for each compiler. And second, it presents an approach to characterize the loop nests based on their hardware performance counter values and a Machine Learning approach that predicts which compiler will generate the fastest code for a loop nest. The prediction is made for both auto-vectorized, serial compilation and for auto-parallelization. The results show that the headroom for state-of-the-art compilers ranges from 1.10x to 1.42x for the serial code and from 1.30x to 1.71x for the auto-parallelized code. These results are based on the Machine Learning predictions.
UR - http://www.scopus.com/inward/record.url?scp=85076289982&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85076289982&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-34627-0_6
DO - 10.1007/978-3-030-34627-0_6
M3 - Conference contribution
AN - SCOPUS:85076289982
SN - 9783030346263
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 70
EP - 77
BT - Languages and Compilers for Parallel Computing - 31st International Workshop, LCPC 2018, Revised Selected Papers
A2 - Hall, Mary
A2 - Sundar, Hari
PB - Springer
T2 - 31st International Workshop on Languages and Compilers for Parallel Computing, LCPC 2018
Y2 - 9 October 2018 through 11 October 2018
ER -