Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

Christopher Rodrigues, Thomas Jablin, Abdul Dakkak, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input. We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23.100% of its performance on a 128-core cluster.

Original languageEnglish (US)
Title of host publicationPPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Pages247-258
Number of pages12
DOIs
StatePublished - Mar 10 2014
Event2014 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014 - Orlando, FL, United States
Duration: Feb 15 2014Feb 19 2014

Publication series

NameProceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

Other

Other2014 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014
CountryUnited States
CityOrlando, FL
Period2/15/142/19/14

Fingerprint

Cluster computing
Computer systems programming
Data storage equipment
Parallel programming
Electric fuses
Computer programming
Computer programming languages
Interfaces (computer)
Scheduling
Decomposition
Communication

Keywords

  • Algorithmic skeletons
  • Loop fusion
  • Parallel programming

ASJC Scopus subject areas

  • Software

Cite this

Rodrigues, C., Jablin, T., Dakkak, A., & Hwu, W-M. W. (2014). Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing. In PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (pp. 247-258). (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP). https://doi.org/10.1145/2555243.2555268

Triolet : A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing. / Rodrigues, Christopher; Jablin, Thomas; Dakkak, Abdul; Hwu, Wen-Mei W.

PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2014. p. 247-258 (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rodrigues, C, Jablin, T, Dakkak, A & Hwu, W-MW 2014, Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing. in PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP, pp. 247-258, 2014 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2014, Orlando, FL, United States, 2/15/14. https://doi.org/10.1145/2555243.2555268
Rodrigues C, Jablin T, Dakkak A, Hwu W-MW. Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing. In PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2014. p. 247-258. (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP). https://doi.org/10.1145/2555243.2555268
Rodrigues, Christopher ; Jablin, Thomas ; Dakkak, Abdul ; Hwu, Wen-Mei W. / Triolet : A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing. PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. 2014. pp. 247-258 (Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP).
@inproceedings{7f8c2d27bff444269072417f186e38f5,
title = "Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing",
abstract = "Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input. We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23.100{\%} of its performance on a 128-core cluster.",
keywords = "Algorithmic skeletons, Loop fusion, Parallel programming",
author = "Christopher Rodrigues and Thomas Jablin and Abdul Dakkak and Hwu, {Wen-Mei W}",
year = "2014",
month = "3",
day = "10",
doi = "10.1145/2555243.2555268",
language = "English (US)",
isbn = "9781450326568",
series = "Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP",
pages = "247--258",
booktitle = "PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming",

}

TY - GEN

T1 - Triolet

T2 - A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

AU - Rodrigues, Christopher

AU - Jablin, Thomas

AU - Dakkak, Abdul

AU - Hwu, Wen-Mei W

PY - 2014/3/10

Y1 - 2014/3/10

N2 - Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input. We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23.100% of its performance on a 128-core cluster.

AB - Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input. We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23.100% of its performance on a 128-core cluster.

KW - Algorithmic skeletons

KW - Loop fusion

KW - Parallel programming

UR - http://www.scopus.com/inward/record.url?scp=84896900933&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84896900933&partnerID=8YFLogxK

U2 - 10.1145/2555243.2555268

DO - 10.1145/2555243.2555268

M3 - Conference contribution

AN - SCOPUS:84896900933

SN - 9781450326568

T3 - Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP

SP - 247

EP - 258

BT - PPoPP 2014 - Proceedings of the 2014 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

ER -