Abstract
In this paper, we present collaborative CPU + GPU algorithms for triangle counting and truss decomposition, the two fundamental problems in graph analytics. We describe the implementation details and present experimental evaluation on the IBM Minsky platform. The main contribution of this paper is a thorough benchmarking and comparison of the different memory management schemes offered by CUDA 8 and NVLink, which can be harnessed for tackling large problems where the limited GPU memory capacity is the primary bottleneck in traditional computing platform. We find that the collaborative algorithms achieve 28× speedup on average (180× max) for triangle counting, and 165× speedup on average (498× max) for truss decomposition, when compared with the baseline Python implementation provided by the Graph Challenge organizers.
Original language | English (US) |
---|---|
Title of host publication | 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781538634721 |
DOIs | |
State | Published - Oct 30 2017 |
Event | 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017 - Waltham, United States Duration: Sep 12 2017 → Sep 14 2017 |
Publication series
Name | 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017 |
---|
Other
Other | 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017 |
---|---|
Country | United States |
City | Waltham |
Period | 9/12/17 → 9/14/17 |
Fingerprint
Keywords
- CUDA
- GPU
- collaborative graph algorithms
- triangle counting
- truss decomposition
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Hardware and Architecture
- Computer Networks and Communications
Cite this
Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture : Static graph challenge: Subgraph isomorphism. / Date, Ketan; Feng, Keven; Nagi, Rakesh; Xiong, Jinjun; Kim, Nam Sung; Hwu, Wen-Mei W.
2017 IEEE High Performance Extreme Computing Conference, HPEC 2017. Institute of Electrical and Electronics Engineers Inc., 2017. 8091042 (2017 IEEE High Performance Extreme Computing Conference, HPEC 2017).Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Collaborative (CPU + GPU) algorithms for triangle counting and truss decomposition on the Minsky architecture
T2 - Static graph challenge: Subgraph isomorphism
AU - Date, Ketan
AU - Feng, Keven
AU - Nagi, Rakesh
AU - Xiong, Jinjun
AU - Kim, Nam Sung
AU - Hwu, Wen-Mei W
PY - 2017/10/30
Y1 - 2017/10/30
N2 - In this paper, we present collaborative CPU + GPU algorithms for triangle counting and truss decomposition, the two fundamental problems in graph analytics. We describe the implementation details and present experimental evaluation on the IBM Minsky platform. The main contribution of this paper is a thorough benchmarking and comparison of the different memory management schemes offered by CUDA 8 and NVLink, which can be harnessed for tackling large problems where the limited GPU memory capacity is the primary bottleneck in traditional computing platform. We find that the collaborative algorithms achieve 28× speedup on average (180× max) for triangle counting, and 165× speedup on average (498× max) for truss decomposition, when compared with the baseline Python implementation provided by the Graph Challenge organizers.
AB - In this paper, we present collaborative CPU + GPU algorithms for triangle counting and truss decomposition, the two fundamental problems in graph analytics. We describe the implementation details and present experimental evaluation on the IBM Minsky platform. The main contribution of this paper is a thorough benchmarking and comparison of the different memory management schemes offered by CUDA 8 and NVLink, which can be harnessed for tackling large problems where the limited GPU memory capacity is the primary bottleneck in traditional computing platform. We find that the collaborative algorithms achieve 28× speedup on average (180× max) for triangle counting, and 165× speedup on average (498× max) for truss decomposition, when compared with the baseline Python implementation provided by the Graph Challenge organizers.
KW - CUDA
KW - GPU
KW - collaborative graph algorithms
KW - triangle counting
KW - truss decomposition
UR - http://www.scopus.com/inward/record.url?scp=85041203226&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85041203226&partnerID=8YFLogxK
U2 - 10.1109/HPEC.2017.8091042
DO - 10.1109/HPEC.2017.8091042
M3 - Conference contribution
AN - SCOPUS:85041203226
T3 - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
BT - 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
ER -