Abstract
We present techniques to process large scale-free graphs in distributed memory. Our aim is to scale to trillions of edges, and our research is targeted at leadership class supercomputers and clusters with local non-volatile memory, e.g., NAND Flash. We apply an edge list partitioning technique, designed to accommodate high-degree vertices (hubs) that create scaling challenges when processing scale-free graphs. In addition to partitioning hubs, we use ghost vertices to represent the hubs to reduce communication hotspots. We present a scaling study with three important graph algorithms: Breadth-First Search (BFS), K-Core decomposition, and Triangle Counting. We also demonstrate scalability on BG/P Intrepid by comparing to best known Graph500 results. We show results on two clusters with local NVRAM storage that are capable of traversing trillion-edge scale-free graphs. By leveraging node-local NAND Flash, our approach can process thirty-two times larger datasets with only a 39% performance degradation in Traversed Edges Per Second (TEPS).
Original language | English (US) |
---|---|
Pages | 825-836 |
Number of pages | 12 |
DOIs | |
State | Published - 2013 |
Externally published | Yes |
Event | 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 - Boston, MA, United States Duration: May 20 2013 → May 24 2013 |
Other
Other | 27th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2013 |
---|---|
Country/Territory | United States |
City | Boston, MA |
Period | 5/20/13 → 5/24/13 |
Keywords
- big data
- distributed computing
- graph algorithms
- parallel algorithms
ASJC Scopus subject areas
- Software