TY - GEN
T1 - Breadth-First Search on Heterogeneous Platforms
T2 - 28th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2016
AU - Remis, Luis
AU - Garzaran, Maria Jesus
AU - Asenjo, Rafael
AU - Navarro, Angeles
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/12/16
Y1 - 2016/12/16
N2 - Breadth-First Search (BFS) is the core of many graph analysis algorithms and it is used in many problems, such as social network, computer network analysis, and data organization. BFS is an iterative algorithm that due to its irregular behavior is quite challenging to parallelize. Several approaches implement efficient algorithms for BFS for multicore architectures and for Graphics Processors, but it is still an open problem how to distribute the work among the main cores and the accelerators. In this paper, we assess several approaches to perform BFS on different heterogenous architectures (highend and embedded mobile processors composed of a multi-core CPU and an integrated GPU) with a focus on social network graphs. In particular, we propose two heterogenous approaches to exploit both devices. The first one, called Selective, selects on which device to execute each iteration. It is based on a previous approach, but we have adapted it to take advantage of the features of social network graphs (fewer iterations but more unbalanced). The second approach, referred as Concurrent, allows the execution of specific iterations concurrently in both devices. Our heterogenous implementations can be up to 1.56x faster and 1.32x more energy efficient with respect to the best of only-CPU or only-GPU baselines. We have also found that for a highly memory bound problem like BFS, the CPU-GPU collaborative execution is limited by the shared-memory bus bandwidth.
AB - Breadth-First Search (BFS) is the core of many graph analysis algorithms and it is used in many problems, such as social network, computer network analysis, and data organization. BFS is an iterative algorithm that due to its irregular behavior is quite challenging to parallelize. Several approaches implement efficient algorithms for BFS for multicore architectures and for Graphics Processors, but it is still an open problem how to distribute the work among the main cores and the accelerators. In this paper, we assess several approaches to perform BFS on different heterogenous architectures (highend and embedded mobile processors composed of a multi-core CPU and an integrated GPU) with a focus on social network graphs. In particular, we propose two heterogenous approaches to exploit both devices. The first one, called Selective, selects on which device to execute each iteration. It is based on a previous approach, but we have adapted it to take advantage of the features of social network graphs (fewer iterations but more unbalanced). The second approach, referred as Concurrent, allows the execution of specific iterations concurrently in both devices. Our heterogenous implementations can be up to 1.56x faster and 1.32x more energy efficient with respect to the best of only-CPU or only-GPU baselines. We have also found that for a highly memory bound problem like BFS, the CPU-GPU collaborative execution is limited by the shared-memory bus bandwidth.
KW - Graphs
KW - bfs
KW - breadth first search
KW - gpu
KW - heterogeneous
KW - social networks
UR - http://www.scopus.com/inward/record.url?scp=85010461739&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85010461739&partnerID=8YFLogxK
U2 - 10.1109/SBAC-PAD.2016.23
DO - 10.1109/SBAC-PAD.2016.23
M3 - Conference contribution
AN - SCOPUS:85010461739
T3 - Proceedings - Symposium on Computer Architecture and High Performance Computing
SP - 118
EP - 125
BT - Proceedings - 28th IEEE International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2016
PB - IEEE Computer Society
Y2 - 26 October 2016 through 28 October 2016
ER -