TY - GEN
T1 - All you need is superword-level parallelism
T2 - 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation, PLDI 2022
AU - Chen, Yishen
AU - Mendis, Charith
AU - Amarasinghe, Saman
N1 - Publisher Copyright:
© 2022 Owner/Author.
PY - 2022/6/9
Y1 - 2022/6/9
N2 - Superword-level parallelism (SLP) vectorization is a proven technique for vectorizing straight-line code. It works by replacing independent, isomorphic instructions with equivalent vector instructions. Larsen and Amarasinghe originally proposed using SLP vectorization (together with loop unrolling) as a simpler, more flexible alternative to traditional loop vectorization. However, this vision of replacing traditional loop vectorization has not been realized because SLP vectorization cannot directly reason with control flow. In this work, we introduce SuperVectorization, a new vectorization framework that generalizes SLP vectorization to uncover parallelism that spans different basic blocks and loop nests. With the capability to systematically vectorize instructions across control-flow regions such as basic blocks and loops, our framework simultaneously subsumes the roles of inner-loop, outer-loop, and straight-line vectorizer while retaining the flexibility of SLP vectorization (e.g., partial vectorization). Our evaluation shows that a single instance of our vectorizer is competitive with and, in many cases, significantly better than LLVM's vectorization pipeline, which includes both loop and SLP vectorizers. For example, on an unoptimized, sequential volume renderer from Pharr and Mark, our vectorizer gains a 3.28× speedup, whereas none of the production compilers that we tested vectorizes to its complex control-flow constructs.
AB - Superword-level parallelism (SLP) vectorization is a proven technique for vectorizing straight-line code. It works by replacing independent, isomorphic instructions with equivalent vector instructions. Larsen and Amarasinghe originally proposed using SLP vectorization (together with loop unrolling) as a simpler, more flexible alternative to traditional loop vectorization. However, this vision of replacing traditional loop vectorization has not been realized because SLP vectorization cannot directly reason with control flow. In this work, we introduce SuperVectorization, a new vectorization framework that generalizes SLP vectorization to uncover parallelism that spans different basic blocks and loop nests. With the capability to systematically vectorize instructions across control-flow regions such as basic blocks and loops, our framework simultaneously subsumes the roles of inner-loop, outer-loop, and straight-line vectorizer while retaining the flexibility of SLP vectorization (e.g., partial vectorization). Our evaluation shows that a single instance of our vectorizer is competitive with and, in many cases, significantly better than LLVM's vectorization pipeline, which includes both loop and SLP vectorizers. For example, on an unoptimized, sequential volume renderer from Pharr and Mark, our vectorizer gains a 3.28× speedup, whereas none of the production compilers that we tested vectorizes to its complex control-flow constructs.
KW - auto-vectorization
KW - optimization
UR - http://www.scopus.com/inward/record.url?scp=85132265149&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132265149&partnerID=8YFLogxK
U2 - 10.1145/3519939.3523701
DO - 10.1145/3519939.3523701
M3 - Conference contribution
AN - SCOPUS:85132265149
T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
SP - 301
EP - 315
BT - PLDI 2022 - Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation
A2 - Jhala, Ranjit
A2 - Dillig, Isil
PB - Association for Computing Machinery
Y2 - 13 June 2022 through 17 June 2022
ER -