TY - GEN
T1 - Collaborative (CPU + GPU) Algorithms for Triangle Counting and Truss Decomposition
AU - Mailthody, Vikram S.
AU - Date, Ketan
AU - Qureshi, Zaid
AU - Pearson, Carl
AU - Nagi, Rakesh
AU - Xiong, Jinjun
AU - Hwu, Wen Mei
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/26
Y1 - 2018/11/26
N2 - In this paper, we present an update to our previous submission from Graph Challenge 2017. This work describes and evaluates new software algorithm optimizations undertaken for our 2018 year submission on Collaborative CPU+GPU Algorithms for Triangle Counting and Truss Decomposition. First, we describe four major optimizations for the triangle counting which improved performance by up to 117x over our prior submission. Additionally, we show that our triangle-counting algorithm is on average 151.7x faster than NVIDIA's NVGraph library (max 476x) for SNAP datasets. Second, we propose a novel parallel k-truss decomposition algorithm that is time-efficient and is up to 13.9x faster than our previous submission. Third, we evaluate the effect of generational hardware improvements between the IBM 'Minsky' (POWER8, P100, NVLink 1.0) and 'Newell' (POWER9, V100, NVLink 2.0) platforms. Lastly, the software optimizations presented in this work and the hardware improvements in the Newell platform enable analytics and discovery on large graphs with millions of nodes and billions of edges in less than a minute. In sum, the new algorithmic implementations are significantly faster and can handle much larger 'big' graphs.
AB - In this paper, we present an update to our previous submission from Graph Challenge 2017. This work describes and evaluates new software algorithm optimizations undertaken for our 2018 year submission on Collaborative CPU+GPU Algorithms for Triangle Counting and Truss Decomposition. First, we describe four major optimizations for the triangle counting which improved performance by up to 117x over our prior submission. Additionally, we show that our triangle-counting algorithm is on average 151.7x faster than NVIDIA's NVGraph library (max 476x) for SNAP datasets. Second, we propose a novel parallel k-truss decomposition algorithm that is time-efficient and is up to 13.9x faster than our previous submission. Third, we evaluate the effect of generational hardware improvements between the IBM 'Minsky' (POWER8, P100, NVLink 1.0) and 'Newell' (POWER9, V100, NVLink 2.0) platforms. Lastly, the software optimizations presented in this work and the hardware improvements in the Newell platform enable analytics and discovery on large graphs with millions of nodes and billions of edges in less than a minute. In sum, the new algorithmic implementations are significantly faster and can handle much larger 'big' graphs.
KW - CUDA
KW - Collaborative graph algorithms
KW - GPU
KW - Triangle counting
KW - Truss decomposition
UR - http://www.scopus.com/inward/record.url?scp=85060097076&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85060097076&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2018.8547517
DO - 10.1109/HPEC.2018.8547517
M3 - Conference contribution
AN - SCOPUS:85060097076
T3 - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
BT - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018
Y2 - 25 September 2018 through 27 September 2018
ER -