TY - GEN
T1 - Performance portability with the Chapel language
AU - Sidelnik, Albert
AU - Maleki, Saeed
AU - Chamberlain, Bradford L.
AU - Garzarán, María J.
AU - Padua, David
PY - 2012
Y1 - 2012
N2 - It has been widely shown that high-throughput computing architectures such as GPUs offer large performance gains compared with their traditional low-latency counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, loss of portability across different architectures, explicit data movement, and challenges in performance optimization. This paper presents novel methods and compiler transformations that increase programmer productivity by enabling users of the language Chapel to provide a single code implementation that the compiler can then use to target not only conventional multiprocessors, but also high-throughput and hybrid machines. Rather than resorting to different parallel libraries or annotations for a given parallel platform, this work leverages a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of providing portability across different parallel architectures. Finally, this work presents experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA on both GPUs and multicore platforms.
AB - It has been widely shown that high-throughput computing architectures such as GPUs offer large performance gains compared with their traditional low-latency counterparts for many applications. The downside to these architectures is that the current programming models present numerous challenges to the programmer: lower-level languages, loss of portability across different architectures, explicit data movement, and challenges in performance optimization. This paper presents novel methods and compiler transformations that increase programmer productivity by enabling users of the language Chapel to provide a single code implementation that the compiler can then use to target not only conventional multiprocessors, but also high-throughput and hybrid machines. Rather than resorting to different parallel libraries or annotations for a given parallel platform, this work leverages a language that has been designed from first principles to address the challenge of programming for parallelism and locality. This also has the advantage of providing portability across different parallel architectures. Finally, this work presents experimental results from the Parboil benchmark suite which demonstrate that codes written in Chapel achieve performance comparable to the original versions implemented in CUDA on both GPUs and multicore platforms.
UR - http://www.scopus.com/inward/record.url?scp=84866876789&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866876789&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2012.60
DO - 10.1109/IPDPS.2012.60
M3 - Conference contribution
AN - SCOPUS:84866876789
SN - 9780769546759
T3 - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
SP - 582
EP - 594
BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
T2 - 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Y2 - 21 May 2012 through 25 May 2012
ER -