TY - GEN
T1 - Snug
T2 - 34th ACM International Conference on Supercomputing, ICS 2020
AU - Heidarshenas, Azin
AU - Gangwani, Tanmay
AU - Yesil, Serif
AU - Morrison, Adam
AU - Torrellas, Josep
N1 - Publisher Copyright:
© 2020 ACM.
PY - 2020/6/29
Y1 - 2020/6/29
N2 - Many parallel algorithms in domains such as graph analytics and simulations rely on priority-based task scheduling. In such environments, the data structure of choice is a concurrent priority queue (PQ). Unfortunately, PQ algorithms exhibit an undesirable tradeoff. On one hand, strict PQs always dequeue the highest-priority task, and thus fail to scale because of contention at the head of the queue. On the other hand, relaxed PQs avoid contention by dequeuing tasks that are sometimes so far from the head that the resulting schedule misses the benefit of priority-based scheduling. We propose a novel architecture for relaxing PQs without straying far from the priority-based schedule. Our chip-level architecture, called Snug, distributes the PQ into subqueues, and maintains a set of Work registers that point to the highest-priority task in each sub-queue. Snug provides an instruction that picks a high-quality task to execute. The instruction periodically switches between obtaining an accurate global snapshot, and visiting only local subqueues to reduce traffic. Overall, Snug dequeues high-quality tasks while avoiding both hotspots and excessive network traffic. We evaluate Snug on graph analytics and event simulation programs. On a simulated 64-core chip, Snug reduces the average execution time of the programs by 1.4X, 2.4X and 3.6X compared to state-of-the-art concurrent skip list, SprayList, and software-distributed PQs, respectively.
AB - Many parallel algorithms in domains such as graph analytics and simulations rely on priority-based task scheduling. In such environments, the data structure of choice is a concurrent priority queue (PQ). Unfortunately, PQ algorithms exhibit an undesirable tradeoff. On one hand, strict PQs always dequeue the highest-priority task, and thus fail to scale because of contention at the head of the queue. On the other hand, relaxed PQs avoid contention by dequeuing tasks that are sometimes so far from the head that the resulting schedule misses the benefit of priority-based scheduling. We propose a novel architecture for relaxing PQs without straying far from the priority-based schedule. Our chip-level architecture, called Snug, distributes the PQ into subqueues, and maintains a set of Work registers that point to the highest-priority task in each sub-queue. Snug provides an instruction that picks a high-quality task to execute. The instruction periodically switches between obtaining an accurate global snapshot, and visiting only local subqueues to reduce traffic. Overall, Snug dequeues high-quality tasks while avoiding both hotspots and excessive network traffic. We evaluate Snug on graph analytics and event simulation programs. On a simulated 64-core chip, Snug reduces the average execution time of the programs by 1.4X, 2.4X and 3.6X compared to state-of-the-art concurrent skip list, SprayList, and software-distributed PQs, respectively.
KW - concurrent priority queues
KW - graph algorithms
KW - hardware acceleration
KW - multicore architectures
UR - http://www.scopus.com/inward/record.url?scp=85088525857&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85088525857&partnerID=8YFLogxK
U2 - 10.1145/3392717.3392740
DO - 10.1145/3392717.3392740
M3 - Conference contribution
AN - SCOPUS:85088525857
T3 - Proceedings of the International Conference on Supercomputing
BT - Proceedings of the 34th ACM International Conference on Supercomputing, ICS 2020
PB - Association for Computing Machinery
Y2 - 29 June 2020 through 2 July 2020
ER -