TY - GEN

T1 - Fast approximate distance queries in unweighted graphs using bounded asynchrony

AU - Fidel, Adam

AU - Sabido, Francisco Coral

AU - Riedel, Colton

AU - Amato, Nancy Marie

AU - Rauchwerger, Lawrence

N1 - Funding Information:
This research supported in part by NSF awards CNS-0551685, CCF 0702765, CCF- 0833199, CCF-1439145, CCF-1423111, CCF-0830753, IIS-0916053, IIS-0917266, EFRI- 1240483, RI-1217991, by NIH NCI R25 CA090301-11, and by DOE awards DE-AC02- 06CH11357, DE-NA0002376, B575363. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231

PY - 2017

Y1 - 2017

N2 - We introduce a new parallel algorithm for approximate breadth-first ordering of an unweighted graph by using bounded asynchrony to parametrically control both the performance and error of the algorithm. This work is based on the k-level asynchronous (KLA) paradigm that trades expensive global synchronizations in the levelsynchronous model for local synchronizations in the asynchronous model, which may result in redundant work. Instead of correcting errors introduced by asynchrony and redoing work as in KLA, in this work we control the amount of work that is redone and thus the amount of error allowed, leading to higher performance at the expense of a loss of precision. Results of an implementation of this algorithm are presented on up to 32,768 cores, showing 2.27x improvement over the exact KLA algorithm and 3.8x improvement over the level-synchronous version with minimal error on several graph inputs.

AB - We introduce a new parallel algorithm for approximate breadth-first ordering of an unweighted graph by using bounded asynchrony to parametrically control both the performance and error of the algorithm. This work is based on the k-level asynchronous (KLA) paradigm that trades expensive global synchronizations in the levelsynchronous model for local synchronizations in the asynchronous model, which may result in redundant work. Instead of correcting errors introduced by asynchrony and redoing work as in KLA, in this work we control the amount of work that is redone and thus the amount of error allowed, leading to higher performance at the expense of a loss of precision. Results of an implementation of this algorithm are presented on up to 32,768 cores, showing 2.27x improvement over the exact KLA algorithm and 3.8x improvement over the level-synchronous version with minimal error on several graph inputs.

KW - Approximate algorithms

KW - Asynchronous

KW - Breadth-first search

KW - Distance query

KW - Distributed memory

KW - Parallel graph algorithms

UR - http://www.scopus.com/inward/record.url?scp=85011422305&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85011422305&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-52709-3_4

DO - 10.1007/978-3-319-52709-3_4

M3 - Conference contribution

AN - SCOPUS:85011422305

SN - 9783319527086

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 40

EP - 54

BT - Languages and Compilers for Parallel Computing - 29th International Workshop, LCPC 2016, Revised Papers

A2 - Ding, Chen

A2 - Criswell, John

A2 - Wu, Peng

PB - Springer-Verlag Berlin Heidelberg

T2 - 29th International Workshop on Languages and Compilers for Parallel Computing, LCPC 2016

Y2 - 28 September 2016 through 30 September 2016

ER -