In-place data sliding algorithms for many-core architectures

Juan Gómez Luna, Li Wen Chang, I. Jui Sung, Wen-Mei W Hwu, Nicolás Guil

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. Their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA Thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages210-219
Number of pages10
ISBN (Electronic)9781467375870
DOIs
StatePublished - Dec 8 2015
Event44th International Conference on Parallel Processing, ICPP 2015 - Beijing, China
Duration: Sep 1 2015Sep 4 2015

Publication series

NameProceedings of the International Conference on Parallel Processing
Volume2015-December
ISSN (Print)0190-3918

Other

Other44th International Conference on Parallel Processing, ICPP 2015
CountryChina
CityBeijing
Period9/1/159/4/15

Fingerprint

Many-core
Data storage equipment
Irregular
Benchmark
Algebra
Data structures
Relational Algebra
Synchronization
Compaction
Throughput
Data Streams
Thread
Architecture
Manipulation
Data Structures
Adjacent
Graphics processing unit
Unit
Requirements
Demonstrate

Keywords

  • In-place
  • Relational algebra
  • Stream compaction

ASJC Scopus subject areas

  • Software
  • Mathematics(all)
  • Hardware and Architecture

Cite this

Luna, J. G., Chang, L. W., Sung, I. J., Hwu, W-M. W., & Guil, N. (2015). In-place data sliding algorithms for many-core architectures. In Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015 (pp. 210-219). [7349576] (Proceedings of the International Conference on Parallel Processing; Vol. 2015-December). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/ICPP.2015.30

In-place data sliding algorithms for many-core architectures. / Luna, Juan Gómez; Chang, Li Wen; Sung, I. Jui; Hwu, Wen-Mei W; Guil, Nicolás.

Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015. Institute of Electrical and Electronics Engineers Inc., 2015. p. 210-219 7349576 (Proceedings of the International Conference on Parallel Processing; Vol. 2015-December).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Luna, JG, Chang, LW, Sung, IJ, Hwu, W-MW & Guil, N 2015, In-place data sliding algorithms for many-core architectures. in Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015., 7349576, Proceedings of the International Conference on Parallel Processing, vol. 2015-December, Institute of Electrical and Electronics Engineers Inc., pp. 210-219, 44th International Conference on Parallel Processing, ICPP 2015, Beijing, China, 9/1/15. https://doi.org/10.1109/ICPP.2015.30
Luna JG, Chang LW, Sung IJ, Hwu W-MW, Guil N. In-place data sliding algorithms for many-core architectures. In Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 210-219. 7349576. (Proceedings of the International Conference on Parallel Processing). https://doi.org/10.1109/ICPP.2015.30
Luna, Juan Gómez ; Chang, Li Wen ; Sung, I. Jui ; Hwu, Wen-Mei W ; Guil, Nicolás. / In-place data sliding algorithms for many-core architectures. Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 210-219 (Proceedings of the International Conference on Parallel Processing).
@inproceedings{0743d4d1a1514e66aac02aea92579675,
title = "In-place data sliding algorithms for many-core architectures",
abstract = "In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. Their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA Thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.",
keywords = "In-place, Relational algebra, Stream compaction",
author = "Luna, {Juan G{\'o}mez} and Chang, {Li Wen} and Sung, {I. Jui} and Hwu, {Wen-Mei W} and Nicol{\'a}s Guil",
year = "2015",
month = "12",
day = "8",
doi = "10.1109/ICPP.2015.30",
language = "English (US)",
series = "Proceedings of the International Conference on Parallel Processing",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "210--219",
booktitle = "Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015",
address = "United States",

}

TY - GEN

T1 - In-place data sliding algorithms for many-core architectures

AU - Luna, Juan Gómez

AU - Chang, Li Wen

AU - Sung, I. Jui

AU - Hwu, Wen-Mei W

AU - Guil, Nicolás

PY - 2015/12/8

Y1 - 2015/12/8

N2 - In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. Their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA Thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.

AB - In-place data manipulation is very desirable in many-core architectures with limited on-board memory. This paper deals with the in-place implementation of a class of primitives that perform data movements in one direction. We call these primitives Data Sliding (DS) algorithms. Notable among them are relational algebra primitives (such as select and unique), padding to insert empty elements in a data structure, and stream compaction to reduce memory requirements. Their in-place implementation in a bulk synchronous parallel model, such as GPUs, is specially challenging due to the difficulties in synchronizing threads executing on different compute units. Using a novel adjacent work-group synchronization technique, we propose two algorithmic schemes for regular and irregular DS algorithms. With a set of 5 benchmarks, we validate our approaches and compare them to the state-of-the-art implementations of these benchmarks. Our regular DS algorithms demonstrate up to 9.11x and 73.25x on NVIDIA and AMD GPUs, respectively, the throughput of their competitors. Our irregular DS algorithms outperform NVIDIA Thrust library by up to 3.24x on the three most recent generations of NVIDIA GPUs.

KW - In-place

KW - Relational algebra

KW - Stream compaction

UR - http://www.scopus.com/inward/record.url?scp=84976501593&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84976501593&partnerID=8YFLogxK

U2 - 10.1109/ICPP.2015.30

DO - 10.1109/ICPP.2015.30

M3 - Conference contribution

AN - SCOPUS:84976501593

T3 - Proceedings of the International Conference on Parallel Processing

SP - 210

EP - 219

BT - Proceedings - 2015 44th International Annual Conference on Parallel Processing, ICPP 2015

PB - Institute of Electrical and Electronics Engineers Inc.

ER -