Comparison based sorting for systems with multiple GPUs

Ivan Tanasic, Lluís Vilanova, Marc Jordà, Javier Cabezas, Isaac Gelado, Nacho Navarro, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As a basic building block of many applications, sorting algorithms that efficiently run on modern machines are key for the performance of these applications. With the recent shift to using GPUs for general purpose compuing, researches have proposed several sorting algorithms for single-GPU systems. However, some workstations and HPC systems have multiple GPUs, and applications running on them are de- signed to use all available GPUs in the system. In this paper we present a high performance multi-GPU merge sort algorithm that solves the problem of sorting data distributed across several GPUs. Our merge sort algorithm first sorts the data on each GPU using an existing single- GPU sorting algorithm. Then, a series of merge steps pro- duce a globally sorted array distributed across all the GPUs in the system. This merge phase is enabled by a novel pivot selection algorithm that ensures that merge steps always distribute data evenly among all GPUs. We also present the implementation of our sorting algorithm in CUDA, as well as a novel inter-GPU communication technique that enables this pivot selection algorithm. Experimental results show that an efficient implementation of our algorithm achieves a speed up of 1.9x when running on two GPUs and 3.3x when running on four GPUs, compared to sorting on a single GPU. At the same time, it is able to sort two and four times more records, compared to sorting on one GPU.

Original languageEnglish (US)
Title of host publicationProceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013
Pages1-11
Number of pages11
DOIs
StatePublished - Apr 15 2013
Event6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013 - Houston, TX, United States
Duration: Mar 16 2013Mar 16 2013

Publication series

NameACM International Conference Proceeding Series

Other

Other6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013
CountryUnited States
CityHouston, TX
Period3/16/133/16/13

Fingerprint

Sorting
Graphics processing unit

Keywords

  • CUDA
  • GPU
  • Parallel
  • Sorting

ASJC Scopus subject areas

  • Human-Computer Interaction
  • Computer Networks and Communications
  • Computer Vision and Pattern Recognition
  • Software

Cite this

Tanasic, I., Vilanova, L., Jordà, M., Cabezas, J., Gelado, I., Navarro, N., & Hwu, W-M. W. (2013). Comparison based sorting for systems with multiple GPUs. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013 (pp. 1-11). (ACM International Conference Proceeding Series). https://doi.org/10.1145/2458523.2458524

Comparison based sorting for systems with multiple GPUs. / Tanasic, Ivan; Vilanova, Lluís; Jordà, Marc; Cabezas, Javier; Gelado, Isaac; Navarro, Nacho; Hwu, Wen-Mei W.

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013. 2013. p. 1-11 (ACM International Conference Proceeding Series).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Tanasic, I, Vilanova, L, Jordà, M, Cabezas, J, Gelado, I, Navarro, N & Hwu, W-MW 2013, Comparison based sorting for systems with multiple GPUs. in Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013. ACM International Conference Proceeding Series, pp. 1-11, 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013, Houston, TX, United States, 3/16/13. https://doi.org/10.1145/2458523.2458524
Tanasic I, Vilanova L, Jordà M, Cabezas J, Gelado I, Navarro N et al. Comparison based sorting for systems with multiple GPUs. In Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013. 2013. p. 1-11. (ACM International Conference Proceeding Series). https://doi.org/10.1145/2458523.2458524
Tanasic, Ivan ; Vilanova, Lluís ; Jordà, Marc ; Cabezas, Javier ; Gelado, Isaac ; Navarro, Nacho ; Hwu, Wen-Mei W. / Comparison based sorting for systems with multiple GPUs. Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013. 2013. pp. 1-11 (ACM International Conference Proceeding Series).
@inproceedings{4126990bb52d491c833e4c34d0600a3c,
title = "Comparison based sorting for systems with multiple GPUs",
abstract = "As a basic building block of many applications, sorting algorithms that efficiently run on modern machines are key for the performance of these applications. With the recent shift to using GPUs for general purpose compuing, researches have proposed several sorting algorithms for single-GPU systems. However, some workstations and HPC systems have multiple GPUs, and applications running on them are de- signed to use all available GPUs in the system. In this paper we present a high performance multi-GPU merge sort algorithm that solves the problem of sorting data distributed across several GPUs. Our merge sort algorithm first sorts the data on each GPU using an existing single- GPU sorting algorithm. Then, a series of merge steps pro- duce a globally sorted array distributed across all the GPUs in the system. This merge phase is enabled by a novel pivot selection algorithm that ensures that merge steps always distribute data evenly among all GPUs. We also present the implementation of our sorting algorithm in CUDA, as well as a novel inter-GPU communication technique that enables this pivot selection algorithm. Experimental results show that an efficient implementation of our algorithm achieves a speed up of 1.9x when running on two GPUs and 3.3x when running on four GPUs, compared to sorting on a single GPU. At the same time, it is able to sort two and four times more records, compared to sorting on one GPU.",
keywords = "CUDA, GPU, Parallel, Sorting",
author = "Ivan Tanasic and Llu{\'i}s Vilanova and Marc Jord{\`a} and Javier Cabezas and Isaac Gelado and Nacho Navarro and Hwu, {Wen-Mei W}",
year = "2013",
month = "4",
day = "15",
doi = "10.1145/2458523.2458524",
language = "English (US)",
isbn = "9781450320177",
series = "ACM International Conference Proceeding Series",
pages = "1--11",
booktitle = "Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013",

}

TY - GEN

T1 - Comparison based sorting for systems with multiple GPUs

AU - Tanasic, Ivan

AU - Vilanova, Lluís

AU - Jordà, Marc

AU - Cabezas, Javier

AU - Gelado, Isaac

AU - Navarro, Nacho

AU - Hwu, Wen-Mei W

PY - 2013/4/15

Y1 - 2013/4/15

N2 - As a basic building block of many applications, sorting algorithms that efficiently run on modern machines are key for the performance of these applications. With the recent shift to using GPUs for general purpose compuing, researches have proposed several sorting algorithms for single-GPU systems. However, some workstations and HPC systems have multiple GPUs, and applications running on them are de- signed to use all available GPUs in the system. In this paper we present a high performance multi-GPU merge sort algorithm that solves the problem of sorting data distributed across several GPUs. Our merge sort algorithm first sorts the data on each GPU using an existing single- GPU sorting algorithm. Then, a series of merge steps pro- duce a globally sorted array distributed across all the GPUs in the system. This merge phase is enabled by a novel pivot selection algorithm that ensures that merge steps always distribute data evenly among all GPUs. We also present the implementation of our sorting algorithm in CUDA, as well as a novel inter-GPU communication technique that enables this pivot selection algorithm. Experimental results show that an efficient implementation of our algorithm achieves a speed up of 1.9x when running on two GPUs and 3.3x when running on four GPUs, compared to sorting on a single GPU. At the same time, it is able to sort two and four times more records, compared to sorting on one GPU.

AB - As a basic building block of many applications, sorting algorithms that efficiently run on modern machines are key for the performance of these applications. With the recent shift to using GPUs for general purpose compuing, researches have proposed several sorting algorithms for single-GPU systems. However, some workstations and HPC systems have multiple GPUs, and applications running on them are de- signed to use all available GPUs in the system. In this paper we present a high performance multi-GPU merge sort algorithm that solves the problem of sorting data distributed across several GPUs. Our merge sort algorithm first sorts the data on each GPU using an existing single- GPU sorting algorithm. Then, a series of merge steps pro- duce a globally sorted array distributed across all the GPUs in the system. This merge phase is enabled by a novel pivot selection algorithm that ensures that merge steps always distribute data evenly among all GPUs. We also present the implementation of our sorting algorithm in CUDA, as well as a novel inter-GPU communication technique that enables this pivot selection algorithm. Experimental results show that an efficient implementation of our algorithm achieves a speed up of 1.9x when running on two GPUs and 3.3x when running on four GPUs, compared to sorting on a single GPU. At the same time, it is able to sort two and four times more records, compared to sorting on one GPU.

KW - CUDA

KW - GPU

KW - Parallel

KW - Sorting

UR - http://www.scopus.com/inward/record.url?scp=84875986099&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875986099&partnerID=8YFLogxK

U2 - 10.1145/2458523.2458524

DO - 10.1145/2458523.2458524

M3 - Conference contribution

AN - SCOPUS:84875986099

SN - 9781450320177

T3 - ACM International Conference Proceeding Series

SP - 1

EP - 11

BT - Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU 2013

ER -