Fine-grained task migration for graph algorithms using processing in memory

Paula Aguilera, Dong Ping Zhang, Nam Sung Kim, Nuwan Jayasena

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Graphs are used in a wide variety of applicationdomains, from social science to machine learning. Graphalgorithms present large numbers of irregular accesses with littledata reuse to amortize the high cost of memory accesses, requiring high memory bandwidth. Processing in memory (PIM) implemented through 3D die-stacking can deliver this highmemory bandwidth. In a system with multiple memory moduleswith PIM, the in-memory compute logic has low latency and highbandwidth access to its local memory, while accesses to remotememory introduce high latency and energy consumption. Ideally, in such a system, computation and data are partitioned amongthe PIM devices to maximize data locality. But the irregularmemory access patterns present in graph applications make itdifficult to guarantee that the computation in each PIM devicewill only access its local data. A large number of remote memoryaccesses can negate the benefits of using PIM. In this paper, we examine the feasibility and potential of finegrainedwork migration to reduce remote data accesses insystems with multiple PIM devices. First, we propose a datadrivenimplementation of our study algorithms: breadth-firstsearch (BFS), single source shortest path (SSSP) and betweennesscentrality (BC) where each PIM has a queue where the verticesthat it needs to process are held. New vertices that need to beprocessed are enqueued at the PIM device co-located with thememory that stores those vertices. Second, we propose hardwaresupport that takes advantage of PIM to implement highlyefficient queues that improve the performance of the queuingframework by up to 16.7%. Third, we develop a timing model forthe queueing framework to explore the benefits of workmigration vs. remote memory accesses. And, finally, our analysisusing the above framework shows that naïve task migration canlead to performance degradations and identifies trade-offsamong data locality, redundant computation, and load balanceamong PIM devices that must be taken into account to realize thepotential benefits of fine-grain task migration.

Original languageEnglish (US)
Title of host publicationProceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages489-498
Number of pages10
ISBN (Electronic)9781509021406
DOIs
StatePublished - Jul 18 2016
Event30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016 - Chicago, United States
Duration: May 23 2016May 27 2016

Publication series

NameProceedings - 2016 IEEE 30th International Parallel and Distributed Processing Symposium, IPDPS 2016

Other

Other30th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2016
CountryUnited States
CityChicago
Period5/23/165/27/16

Keywords

  • Graph Algorithms
  • Processing In Memory

ASJC Scopus subject areas

  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Fine-grained task migration for graph algorithms using processing in memory'. Together they form a unique fingerprint.

Cite this