TY - GEN
T1 - Optimizing the barnes-hut algorithm in UPC
AU - Zhang, Junchao
AU - Behzad, Babak
AU - Snir, Marc
PY - 2011
Y1 - 2011
N2 - PGAS languages'support of a global name space facilitates the expression of parallel algorithms, since communication is implicit. This is especially convenient when writing irregular applications with data-dependent, dynamically changing communication patterns. However, programming in a shared memory style, with no explicit control of communication, may result in poor performance. The problem may be due to weaknesses of current implementations of PGAS languages or limitations inherent in these languages. To clarify which is the case, we discuss an implementation in UPC of the Barnes-Hut algorithm. A literal port of a good quality shared-memory implementation (merely replacing shared arrays with partitioned global arrays) achieves abysmal performance - more than 1000 times worse than a message-passing implementation. We achieve in UPC a performance comparable to message-passing with a series of optimizations. Most of these optimizations could be performed with limited changes in the source code using an enhanced run-time and a few language extensions or pragmas. We discuss the implications to the programmer, the compiler and PGAS languages themselves.
AB - PGAS languages'support of a global name space facilitates the expression of parallel algorithms, since communication is implicit. This is especially convenient when writing irregular applications with data-dependent, dynamically changing communication patterns. However, programming in a shared memory style, with no explicit control of communication, may result in poor performance. The problem may be due to weaknesses of current implementations of PGAS languages or limitations inherent in these languages. To clarify which is the case, we discuss an implementation in UPC of the Barnes-Hut algorithm. A literal port of a good quality shared-memory implementation (merely replacing shared arrays with partitioned global arrays) achieves abysmal performance - more than 1000 times worse than a message-passing implementation. We achieve in UPC a performance comparable to message-passing with a series of optimizations. Most of these optimizations could be performed with limited changes in the source code using an enhanced run-time and a few language extensions or pragmas. We discuss the implications to the programmer, the compiler and PGAS languages themselves.
UR - http://www.scopus.com/inward/record.url?scp=83155193225&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=83155193225&partnerID=8YFLogxK
U2 - 10.1145/2063384.2063485
DO - 10.1145/2063384.2063485
M3 - Conference contribution
AN - SCOPUS:83155193225
SN - 9781450307710
T3 - Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
BT - Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
T2 - 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11
Y2 - 12 November 2011 through 18 November 2011
ER -