Abstract
Several approaches implement efficient BFS algorithms for multicores and for GPUs. However, when targeting heterogeneous architectures, it is still an open problem how to distribute the work among the CPU cores and the accelerators. In this paper, we assess several approaches to perform BFS on different heterogeneous chips (a multicore CPU and an integrated GPU). In particular, we propose three heterogeneous approaches that exploit the collaboration between both devices: Selective, Concurrent and Asynchronous. We identify how to take advantage of the features of social network graphs, that are a particular example of highly connected graphs-with fewer iterations and more unbalanced-, as well as the drawbacks of each algorithmic implementation. One key feature of our approaches is that they switch between different versions of the algorithm, depending on the device that collaborates in the computation. Through exhaustive evaluation we find that our heterogeneous implementations can be up to 1.56× faster and 1.32× more energy efficient with respect to the best baseline where only one device is used, being the overhead w.r.t. an oracle scheduler below 10%. We also compare with other related heterogeneous approach finding that ours can be up to 3.6× faster.
Original language | English (US) |
---|---|
Pages (from-to) | 282-294 |
Number of pages | 13 |
Journal | Journal of Parallel and Distributed Computing |
Volume | 120 |
DOIs | |
State | Published - Oct 2018 |
Keywords
- BFS
- Heterogeneous chips
- Performance–energy efficiency
- Social network graphs
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computer Networks and Communications
- Artificial Intelligence