TY - GEN
T1 - Optimizing data permutations for SIMD devices
AU - Ren, Gang
AU - Wu, Peng
AU - Padua, David
PY - 2006
Y1 - 2006
N2 - The widespread presence of SIMD devices in today's microprocessors has made compiler techniques for these devices tremendously important. One of the most important and difficult issues that must be addressed by these techniques is the generation of the data permutation instructions needed for non-contiguous and misaligned memory references. These instructions are expensive and, therefore, it is of crucial importance to minimize their number to improve performance and, in many cases, enable speedups over scalar code. Although it is often difficult to optimize an isolated data reorganization operation, a collection of related data permutations can often be manipulated to reduce the number of operations. This paper presents a strategy to optimize all forms of data permutations. The strategy is organized into three steps. First, all data permutations in the source program are converted into a generic representation. These permutations can originate from vector accesses to non-contiguous and misaligned memory locations or result from compiler transformations. Second, an optimization algorithm is applied to reduce the number of data permutations in a basic block. By propagating permutations across statements and merging consecutive permutations whenever possible, the algorithm can significantly reduce the number of data permutations. Finally, a code generation algorithm translates generic permutation operations into native permutation instructions for the target platform. Experiments were conducted on various kinds of applications. The results show that up to 77% of the permutation instructions are eliminated and, as a result, the average performance improvement is 48% on VMX and 68% on SSE2. For several applications, near perfect speedups have been achieved on both platforms.
AB - The widespread presence of SIMD devices in today's microprocessors has made compiler techniques for these devices tremendously important. One of the most important and difficult issues that must be addressed by these techniques is the generation of the data permutation instructions needed for non-contiguous and misaligned memory references. These instructions are expensive and, therefore, it is of crucial importance to minimize their number to improve performance and, in many cases, enable speedups over scalar code. Although it is often difficult to optimize an isolated data reorganization operation, a collection of related data permutations can often be manipulated to reduce the number of operations. This paper presents a strategy to optimize all forms of data permutations. The strategy is organized into three steps. First, all data permutations in the source program are converted into a generic representation. These permutations can originate from vector accesses to non-contiguous and misaligned memory locations or result from compiler transformations. Second, an optimization algorithm is applied to reduce the number of data permutations in a basic block. By propagating permutations across statements and merging consecutive permutations whenever possible, the algorithm can significantly reduce the number of data permutations. Finally, a code generation algorithm translates generic permutation operations into native permutation instructions for the target platform. Experiments were conducted on various kinds of applications. The results show that up to 77% of the permutation instructions are eliminated and, as a result, the average performance improvement is 48% on VMX and 68% on SSE2. For several applications, near perfect speedups have been achieved on both platforms.
KW - Data Permutation
KW - Optimization
KW - SIMD Compilation
UR - http://www.scopus.com/inward/record.url?scp=33745222449&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745222449&partnerID=8YFLogxK
U2 - 10.1145/1133255.1133996
DO - 10.1145/1133255.1133996
M3 - Conference contribution
AN - SCOPUS:33745222449
SN - 1595933204
SN - 9781595933201
T3 - Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
SP - 118
EP - 131
BT - PLDI 2006 - Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation
T2 - PLDI 2006 - 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation
Y2 - 10 June 2006 through 16 June 2006
ER -