TY - JOUR
T1 - Compiler auto-vectorization with imitation learning
AU - Mendis, Charith
AU - Yang, Cambridge
AU - Pu, Yewen
AU - Amarasinghe, Saman
AU - Carbin, Michael
N1 - Funding Information:
We would like to thank Darsh Shah who was initially involved with this project and all reviewers for insightful comments and suggestions. This research was supported by DARPA D3M Award #FA8750-17-2-0126, DARPA HACCS Award #HR0011-18-C-0059 and DARPA SDH Award #HR0011-18-3-0007. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding agencies.
Publisher Copyright:
© 2019 Neural information processing systems foundation. All rights reserved.
PY - 2019
Y1 - 2019
N2 - Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit fine-grained data level parallelism. To exploit this parallelism, compilers employ auto-vectorization techniques to automatically convert scalar code into vector code. Larsen & Amarasinghe (2000) first introduced superword level parallelism (SLP) based vectorization, which is a form of vectorization popularly used by compilers. Current compilers employ hand-crafted heuristics and typically only follow one SLP vectorization strategy which can be suboptimal. Recently, Mendis & Amarasinghe (2018) formulated the instruction packing problem of SLP vectorization by leveraging an integer linear programming (ILP) solver, achieving superior runtime performance. In this work, we explore whether it is feasible to imitate optimal decisions made by their ILP solution by fitting a graph neural network policy. We show that the learnt policy, Vemal, produces a vectorization scheme that is better than the well-tuned heuristics used by the LLVM compiler. More specifically, the learnt agent produces a vectorization strategy that has a 22.6% higher average reduction in cost compared to the LLVM compiler when measured using its own cost model, and matches the runtime performance of the ILP based solution in 5 out of 7 applications in the NAS benchmark suite.
AB - Modern microprocessors are equipped with single instruction multiple data (SIMD) or vector instruction sets which allow compilers to exploit fine-grained data level parallelism. To exploit this parallelism, compilers employ auto-vectorization techniques to automatically convert scalar code into vector code. Larsen & Amarasinghe (2000) first introduced superword level parallelism (SLP) based vectorization, which is a form of vectorization popularly used by compilers. Current compilers employ hand-crafted heuristics and typically only follow one SLP vectorization strategy which can be suboptimal. Recently, Mendis & Amarasinghe (2018) formulated the instruction packing problem of SLP vectorization by leveraging an integer linear programming (ILP) solver, achieving superior runtime performance. In this work, we explore whether it is feasible to imitate optimal decisions made by their ILP solution by fitting a graph neural network policy. We show that the learnt policy, Vemal, produces a vectorization scheme that is better than the well-tuned heuristics used by the LLVM compiler. More specifically, the learnt agent produces a vectorization strategy that has a 22.6% higher average reduction in cost compared to the LLVM compiler when measured using its own cost model, and matches the runtime performance of the ILP based solution in 5 out of 7 applications in the NAS benchmark suite.
UR - http://www.scopus.com/inward/record.url?scp=85090174274&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85090174274&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85090174274
SN - 1049-5258
VL - 32
JO - Advances in Neural Information Processing Systems
JF - Advances in Neural Information Processing Systems
T2 - 33rd Annual Conference on Neural Information Processing Systems, NeurIPS 2019
Y2 - 8 December 2019 through 14 December 2019
ER -