TY - GEN
T1 - The STAPL skeleton framework
AU - Zandifar, Mani
AU - Thomas, Nathan
AU - Amato, Nancy Marie
AU - Rauchwerger, Lawrence
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - This paper describes the stapl Skeleton Framework, a highlevel skeletal approach for parallel programming. This framework abstracts the underlying details of data distribution and parallelism from programmers and enables them to express parallel programs as a composition of existing elementary skeletons such as map, map-reduce, scan, zip, butterfly, allreduce, alltoall and user-defined custom skeletons. Skeletons in this framework are defined as parametric data flow graphs, and their compositions are defined in terms of data flow graph compositions. Defining the composition in this manner allows dependencies between skeletons to be defined in terms of point-to-point dependencies, avoiding unnecessary global synchronizations. To show the ease of composability and expressivity, we implemented the NAS Integer Sort (IS) and Embarrassingly Parallel (EP) benchmarks using skeletons and demonstrate comparable performance to the hand-optimized reference implementations. To demonstrate scalable performance, we show a transformation which enables applications written in terms of skeletons to run on more than 100,000 cores.
AB - This paper describes the stapl Skeleton Framework, a highlevel skeletal approach for parallel programming. This framework abstracts the underlying details of data distribution and parallelism from programmers and enables them to express parallel programs as a composition of existing elementary skeletons such as map, map-reduce, scan, zip, butterfly, allreduce, alltoall and user-defined custom skeletons. Skeletons in this framework are defined as parametric data flow graphs, and their compositions are defined in terms of data flow graph compositions. Defining the composition in this manner allows dependencies between skeletons to be defined in terms of point-to-point dependencies, avoiding unnecessary global synchronizations. To show the ease of composability and expressivity, we implemented the NAS Integer Sort (IS) and Embarrassingly Parallel (EP) benchmarks using skeletons and demonstrate comparable performance to the hand-optimized reference implementations. To demonstrate scalable performance, we show a transformation which enables applications written in terms of skeletons to run on more than 100,000 cores.
UR - http://www.scopus.com/inward/record.url?scp=84937509918&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84937509918&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-17473-0_12
DO - 10.1007/978-3-319-17473-0_12
M3 - Conference contribution
AN - SCOPUS:84937509918
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 176
EP - 190
BT - Languages and Compilers for Parallel Computing - 27th International Workshop, LCPC 2014, Revised Selected Papers
A2 - Brodman, James
A2 - Tu, Peng
PB - Springer
T2 - 27th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2014
Y2 - 15 September 2014 through 17 September 2014
ER -