Histogram sort with sampling

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To minimize data movement, state-of-the-art parallel sorting algorithms use techniques based on sampling and histogramming to partition keys prior to redistribution. Sampling enables partitioning to be done using a representative subset of the keys, while histogramming enables evaluation and iterative improvement of a given partition. We introduce Histogram sort with sampling (HSS), which combines sampling and iterative histogramming to find high-quality partitions with minimal data movement and high practical performance. Compared to the best known (recently introduced) algorithm for finding these partitions, our algorithm requires a factor of Θ(log(p)/ log log(p)) less communication, and substantially less when compared to standard variants of Sample sort and Histogram sort. We provide a distributed-memory implementation of the proposed algorithm, compare its performance to two existing implementations, and provide a brief application study showing benefit of the new algorithm.

Original languageEnglish (US)
Title of host publicationSPAA 2019 - Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures
PublisherAssociation for Computing Machinery
Pages201-212
Number of pages12
ISBN (Electronic)9781450361842
DOIs
StatePublished - Jun 17 2019
Event31st ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2019 - Phoenix, United States
Duration: Jun 22 2019Jun 24 2019

Publication series

NameAnnual ACM Symposium on Parallelism in Algorithms and Architectures

Conference

Conference31st ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2019
CountryUnited States
CityPhoenix
Period6/22/196/24/19

Fingerprint

Histogram
Sort
Partition
Sampling
Sorting algorithm
Distributed Memory
Redistribution
Parallel Algorithms
Partitioning
Sorting
Minimise
Subset
Evaluation
Data storage equipment
Communication
Movement

Keywords

  • Data partitioning
  • Histogramming
  • Parallel sorting
  • Sampling

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Cite this

Harsh, V., Kale, L. V., & Solomonik, E. V. (2019). Histogram sort with sampling. In SPAA 2019 - Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures (pp. 201-212). (Annual ACM Symposium on Parallelism in Algorithms and Architectures). Association for Computing Machinery. https://doi.org/10.1145/3323165.3323184

Histogram sort with sampling. / Harsh, Vipul; Kale, Laxmikant V; Solomonik, Edgar Vadimovich.

SPAA 2019 - Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures. Association for Computing Machinery, 2019. p. 201-212 (Annual ACM Symposium on Parallelism in Algorithms and Architectures).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Harsh, V, Kale, LV & Solomonik, EV 2019, Histogram sort with sampling. in SPAA 2019 - Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures. Annual ACM Symposium on Parallelism in Algorithms and Architectures, Association for Computing Machinery, pp. 201-212, 31st ACM Symposium on Parallelism in Algorithms and Architectures, SPAA 2019, Phoenix, United States, 6/22/19. https://doi.org/10.1145/3323165.3323184
Harsh V, Kale LV, Solomonik EV. Histogram sort with sampling. In SPAA 2019 - Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures. Association for Computing Machinery. 2019. p. 201-212. (Annual ACM Symposium on Parallelism in Algorithms and Architectures). https://doi.org/10.1145/3323165.3323184
Harsh, Vipul ; Kale, Laxmikant V ; Solomonik, Edgar Vadimovich. / Histogram sort with sampling. SPAA 2019 - Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures. Association for Computing Machinery, 2019. pp. 201-212 (Annual ACM Symposium on Parallelism in Algorithms and Architectures).
@inproceedings{de5bc966caee4cb080614944b0ebfb91,
title = "Histogram sort with sampling",
abstract = "To minimize data movement, state-of-the-art parallel sorting algorithms use techniques based on sampling and histogramming to partition keys prior to redistribution. Sampling enables partitioning to be done using a representative subset of the keys, while histogramming enables evaluation and iterative improvement of a given partition. We introduce Histogram sort with sampling (HSS), which combines sampling and iterative histogramming to find high-quality partitions with minimal data movement and high practical performance. Compared to the best known (recently introduced) algorithm for finding these partitions, our algorithm requires a factor of Θ(log(p)/ log log(p)) less communication, and substantially less when compared to standard variants of Sample sort and Histogram sort. We provide a distributed-memory implementation of the proposed algorithm, compare its performance to two existing implementations, and provide a brief application study showing benefit of the new algorithm.",
keywords = "Data partitioning, Histogramming, Parallel sorting, Sampling",
author = "Vipul Harsh and Kale, {Laxmikant V} and Solomonik, {Edgar Vadimovich}",
year = "2019",
month = "6",
day = "17",
doi = "10.1145/3323165.3323184",
language = "English (US)",
series = "Annual ACM Symposium on Parallelism in Algorithms and Architectures",
publisher = "Association for Computing Machinery",
pages = "201--212",
booktitle = "SPAA 2019 - Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures",

}

TY - GEN

T1 - Histogram sort with sampling

AU - Harsh, Vipul

AU - Kale, Laxmikant V

AU - Solomonik, Edgar Vadimovich

PY - 2019/6/17

Y1 - 2019/6/17

N2 - To minimize data movement, state-of-the-art parallel sorting algorithms use techniques based on sampling and histogramming to partition keys prior to redistribution. Sampling enables partitioning to be done using a representative subset of the keys, while histogramming enables evaluation and iterative improvement of a given partition. We introduce Histogram sort with sampling (HSS), which combines sampling and iterative histogramming to find high-quality partitions with minimal data movement and high practical performance. Compared to the best known (recently introduced) algorithm for finding these partitions, our algorithm requires a factor of Θ(log(p)/ log log(p)) less communication, and substantially less when compared to standard variants of Sample sort and Histogram sort. We provide a distributed-memory implementation of the proposed algorithm, compare its performance to two existing implementations, and provide a brief application study showing benefit of the new algorithm.

AB - To minimize data movement, state-of-the-art parallel sorting algorithms use techniques based on sampling and histogramming to partition keys prior to redistribution. Sampling enables partitioning to be done using a representative subset of the keys, while histogramming enables evaluation and iterative improvement of a given partition. We introduce Histogram sort with sampling (HSS), which combines sampling and iterative histogramming to find high-quality partitions with minimal data movement and high practical performance. Compared to the best known (recently introduced) algorithm for finding these partitions, our algorithm requires a factor of Θ(log(p)/ log log(p)) less communication, and substantially less when compared to standard variants of Sample sort and Histogram sort. We provide a distributed-memory implementation of the proposed algorithm, compare its performance to two existing implementations, and provide a brief application study showing benefit of the new algorithm.

KW - Data partitioning

KW - Histogramming

KW - Parallel sorting

KW - Sampling

UR - http://www.scopus.com/inward/record.url?scp=85068671680&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068671680&partnerID=8YFLogxK

U2 - 10.1145/3323165.3323184

DO - 10.1145/3323165.3323184

M3 - Conference contribution

AN - SCOPUS:85068671680

T3 - Annual ACM Symposium on Parallelism in Algorithms and Architectures

SP - 201

EP - 212

BT - SPAA 2019 - Proceedings of the 31st ACM Symposium on Parallelism in Algorithms and Architectures

PB - Association for Computing Machinery

ER -