Design of a Multithreaded Barnes-Hut Algorithm for Multicore Clusters

Junchao Zhang, Babak Behzad, Marc Snir

Research output: Contribution to journalArticlepeer-review


We describe in this paper an implementation of the Barnes-Hut algorithm on multicore clusters. Based on a partitioned global address space (PGAS) library, the design integrates intranode multithreading and internode one-sided communication, exemplifying a PGAS + X programming style. Within a node, the computation is decomposed into tasks (subtasks) and multitasking is used to hide network latency. We study the tradeoffs between locality in private caches and locality in shared caches and bring the insights into the design. As a result, our implementation consumes less memory per core, invokes less internode communication, and enjoys better load-balancing strategies. The final code achieves up to 41 percent performance improvement over a non-multithreaded counterpart. Through detailed comparison, we also show its advantages over other well-known Barnes-Hut implementations, both in programming complexity and in performance.

Original languageEnglish (US)
Article number6837521
Pages (from-to)1861-1873
Number of pages13
JournalIEEE Transactions on Parallel and Distributed Systems
Issue number7
StatePublished - Jul 1 2015


  • Barnes-Hut
  • PGAS
  • cluster
  • multicore
  • n-body

ASJC Scopus subject areas

  • Signal Processing
  • Hardware and Architecture
  • Computational Theory and Mathematics


Dive into the research topics of 'Design of a Multithreaded Barnes-Hut Algorithm for Multicore Clusters'. Together they form a unique fingerprint.

Cite this