Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

Christopher Rodrigues, Thomas Jablin, Abdul Dakkak, Wen-Mei W Hwu

Research output: Contribution to journalArticle

Abstract

Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input. We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23.100% of its performance on a 128-core cluster.

Original languageEnglish (US)
Pages (from-to)247-258
Number of pages12
JournalACM SIGPLAN Notices
Volume49
Issue number8
DOIs
StatePublished - Aug 2014

Fingerprint

Cluster computing
Computer systems programming
Data storage equipment
Parallel programming
Electric fuses
Computer programming
Computer programming languages
Interfaces (computer)
Scheduling
Decomposition
Communication

Keywords

  • Algorithmic skeletons
  • Loop fusion
  • Parallel programming

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Triolet : A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing. / Rodrigues, Christopher; Jablin, Thomas; Dakkak, Abdul; Hwu, Wen-Mei W.

In: ACM SIGPLAN Notices, Vol. 49, No. 8, 08.2014, p. 247-258.

Research output: Contribution to journalArticle

Rodrigues, Christopher ; Jablin, Thomas ; Dakkak, Abdul ; Hwu, Wen-Mei W. / Triolet : A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing. In: ACM SIGPLAN Notices. 2014 ; Vol. 49, No. 8. pp. 247-258.
@article{b7ac66f4d86f4dcb80449a45217f8469,
title = "Triolet: A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing",
abstract = "Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input. We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23.100{\%} of its performance on a 128-core cluster.",
keywords = "Algorithmic skeletons, Loop fusion, Parallel programming",
author = "Christopher Rodrigues and Thomas Jablin and Abdul Dakkak and Hwu, {Wen-Mei W}",
year = "2014",
month = "8",
doi = "10.1145/2555243.2555268",
language = "English (US)",
volume = "49",
pages = "247--258",
journal = "ACM SIGPLAN Notices",
issn = "1523-2867",
publisher = "Association for Computing Machinery (ACM)",
number = "8",

}

TY - JOUR

T1 - Triolet

T2 - A programming system that unifies algorithmic skeleton interfaces for high-performance cluster computing

AU - Rodrigues, Christopher

AU - Jablin, Thomas

AU - Dakkak, Abdul

AU - Hwu, Wen-Mei W

PY - 2014/8

Y1 - 2014/8

N2 - Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input. We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23.100% of its performance on a 128-core cluster.

AB - Functional algorithmic skeletons promise a high-level programming interface for distributed-memory clusters that free developers from concerns of task decomposition, scheduling, and communication. Unfortunately, prior distributed functional skeleton frameworks do not deliver performance comparable to that achievable in a low-level distributed programming model such as C with MPI and OpenMP, even when used in concert with high-performance array libraries. There are several causes: they do not take advantage of shared memory on each cluster node; they impose a fixed partitioning strategy on input data; and they have limited ability to fuse loops involving skeletons that produce a variable number of outputs per input. We address these shortcomings in the Triolet programming language through a modular library design that separates concerns of parallelism, loop nesting, and data partitioning. We show how Triolet substantially improves the parallel performance of algorithms involving array traversals and nested, variable-size loops over what is achievable in Eden, a distributed variant of Haskell. We further demonstrate how Triolet can substantially simplify parallel programming relative to C with MPI and OpenMP while achieving 23.100% of its performance on a 128-core cluster.

KW - Algorithmic skeletons

KW - Loop fusion

KW - Parallel programming

UR - http://www.scopus.com/inward/record.url?scp=84950134997&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84950134997&partnerID=8YFLogxK

U2 - 10.1145/2555243.2555268

DO - 10.1145/2555243.2555268

M3 - Article

AN - SCOPUS:84950134997

VL - 49

SP - 247

EP - 258

JO - ACM SIGPLAN Notices

JF - ACM SIGPLAN Notices

SN - 1523-2867

IS - 8

ER -