TY - JOUR
T1 - Exploiting social network graph characteristics for efficient BFS on heterogeneous chips
AU - Remis, Luis
AU - Garzaran, Maria Jesus
AU - Asenjo, Rafael
AU - Navarro, Angeles
N1 - This work is supported in part by NSF under grant CNS-1319657 and by Spanish projects TIN2013-42253-P, TIN2016-80920-R and P11-TIC-08144.
This work is supported in part by NSF under grant CNS-1319657 and by Spanish projects TIN2013-42253-P, TIN2016-80920-R and P11-TIC-08144.
PY - 2018/10
Y1 - 2018/10
N2 - Several approaches implement efficient BFS algorithms for multicores and for GPUs. However, when targeting heterogeneous architectures, it is still an open problem how to distribute the work among the CPU cores and the accelerators. In this paper, we assess several approaches to perform BFS on different heterogeneous chips (a multicore CPU and an integrated GPU). In particular, we propose three heterogeneous approaches that exploit the collaboration between both devices: Selective, Concurrent and Asynchronous. We identify how to take advantage of the features of social network graphs, that are a particular example of highly connected graphs-with fewer iterations and more unbalanced-, as well as the drawbacks of each algorithmic implementation. One key feature of our approaches is that they switch between different versions of the algorithm, depending on the device that collaborates in the computation. Through exhaustive evaluation we find that our heterogeneous implementations can be up to 1.56× faster and 1.32× more energy efficient with respect to the best baseline where only one device is used, being the overhead w.r.t. an oracle scheduler below 10%. We also compare with other related heterogeneous approach finding that ours can be up to 3.6× faster.
AB - Several approaches implement efficient BFS algorithms for multicores and for GPUs. However, when targeting heterogeneous architectures, it is still an open problem how to distribute the work among the CPU cores and the accelerators. In this paper, we assess several approaches to perform BFS on different heterogeneous chips (a multicore CPU and an integrated GPU). In particular, we propose three heterogeneous approaches that exploit the collaboration between both devices: Selective, Concurrent and Asynchronous. We identify how to take advantage of the features of social network graphs, that are a particular example of highly connected graphs-with fewer iterations and more unbalanced-, as well as the drawbacks of each algorithmic implementation. One key feature of our approaches is that they switch between different versions of the algorithm, depending on the device that collaborates in the computation. Through exhaustive evaluation we find that our heterogeneous implementations can be up to 1.56× faster and 1.32× more energy efficient with respect to the best baseline where only one device is used, being the overhead w.r.t. an oracle scheduler below 10%. We also compare with other related heterogeneous approach finding that ours can be up to 3.6× faster.
KW - BFS
KW - Heterogeneous chips
KW - Performance–energy efficiency
KW - Social network graphs
UR - https://www.scopus.com/pages/publications/85044537905
UR - https://www.scopus.com/pages/publications/85044537905#tab=citedBy
U2 - 10.1016/j.jpdc.2017.11.003
DO - 10.1016/j.jpdc.2017.11.003
M3 - Article
AN - SCOPUS:85044537905
SN - 0743-7315
VL - 120
SP - 282
EP - 294
JO - Journal of Parallel and Distributed Computing
JF - Journal of Parallel and Distributed Computing
ER -