TY - GEN
T1 - All you need is superword-level parallelism
T2 - 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2022
AU - Chen, Yishen
AU - Mendis, Charith
AU - Amarasinghe, Saman
N1 - We thank our shepherd Laure Gonnord and the anonymous reviewers for their valuable suggestions. We thank Teodoro Collin, Logan Weber, Jesse Michel, Alex Renda, Daniel Do-nenfeld, and Changwan Hong for reading early drafts of this paper and providing feedback. Our work is supported by the DARPA/SRC JUMP ADA Center; the Toyota Research Institute; the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research under Award Numbers DESC0008923 and DESC0018121; NSF Grant No. CCF-1533753; and DARPA under Awards HR0011-18-3-0007 and HR0011-20-9-0017. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the aforementioned funding agencies.
PY - 2022/6/9
Y1 - 2022/6/9
N2 - Superword-level parallelism (SLP) vectorization is a proven technique for vectorizing straight-line code. It works by replacing independent, isomorphic instructions with equivalent vector instructions. Larsen and Amarasinghe originally proposed using SLP vectorization (together with loop unrolling) as a simpler, more flexible alternative to traditional loop vectorization. However, this vision of replacing traditional loop vectorization has not been realized because SLP vectorization cannot directly reason with control flow. In this work, we introduce SuperVectorization, a new vectorization framework that generalizes SLP vectorization to uncover parallelism that spans different basic blocks and loop nests. With the capability to systematically vectorize instructions across control-flow regions such as basic blocks and loops, our framework simultaneously subsumes the roles of inner-loop, outer-loop, and straight-line vectorizer while retaining the flexibility of SLP vectorization (e.g., partial vectorization). Our evaluation shows that a single instance of our vectorizer is competitive with and, in many cases, significantly better than LLVM's vectorization pipeline, which includes both loop and SLP vectorizers. For example, on an unoptimized, sequential volume renderer from Pharr and Mark, our vectorizer gains a 3.28× speedup, whereas none of the production compilers that we tested vectorizes to its complex control-flow constructs.
AB - Superword-level parallelism (SLP) vectorization is a proven technique for vectorizing straight-line code. It works by replacing independent, isomorphic instructions with equivalent vector instructions. Larsen and Amarasinghe originally proposed using SLP vectorization (together with loop unrolling) as a simpler, more flexible alternative to traditional loop vectorization. However, this vision of replacing traditional loop vectorization has not been realized because SLP vectorization cannot directly reason with control flow. In this work, we introduce SuperVectorization, a new vectorization framework that generalizes SLP vectorization to uncover parallelism that spans different basic blocks and loop nests. With the capability to systematically vectorize instructions across control-flow regions such as basic blocks and loops, our framework simultaneously subsumes the roles of inner-loop, outer-loop, and straight-line vectorizer while retaining the flexibility of SLP vectorization (e.g., partial vectorization). Our evaluation shows that a single instance of our vectorizer is competitive with and, in many cases, significantly better than LLVM's vectorization pipeline, which includes both loop and SLP vectorizers. For example, on an unoptimized, sequential volume renderer from Pharr and Mark, our vectorizer gains a 3.28× speedup, whereas none of the production compilers that we tested vectorizes to its complex control-flow constructs.
KW - auto-vectorization
KW - optimization
UR - https://www.scopus.com/pages/publications/85132265149
UR - https://www.scopus.com/pages/publications/85132265149#tab=citedBy
U2 - 10.1145/3519939.3523701
DO - 10.1145/3519939.3523701
M3 - Conference contribution
AN - SCOPUS:85132265149
T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
SP - 301
EP - 315
BT - PLDI 2022 - Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
A2 - Jhala, Ranjit
A2 - Dillig, Isil
PB - Association for Computing Machinery
Y2 - 13 June 2022 through 17 June 2022
ER -