KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism

Izzat El Hajj, Juan Gomez-Luna, Cheng Li, Li Wen Chang, Dejan Milojicic, Wen Mei Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Dynamic parallelism on GPUs simplifies the programming of many classes of applications that generate paral-lelizable work not known prior to execution. However, modern GPUs architectures do not support dynamic parallelism efficiently due to the high kernel launch overhead, limited number of simultaneous kernels, and limited depth of dynamic calls a device can support. In this paper, we propose Kernel Launch Aggregation and Promotion (KLAP), a set of compiler techniques that improve the performance of kernels which use dynamic parallelism. Kernel launch aggregation fuses kernels launched by threads in the same warp, block, or kernel into a single aggregated kernel, thereby reducing the total number of kernels spawned and increasing the amount of work per kernel to improve occupancy. Kernel launch promotion enables early launch of child kernels to extract more parallelism between parents and children, and to aggregate kernel launches across generations mitigating the problem of limited depth. We implement our techniques in a real compiler and show that kernel launch aggregation obtains a geometric mean speedup of 6.58x over regular dynamic parallelism. We also show that kernel launch promotion enables cases that were not originally possible, improving throughput by a geometric mean of 30.44 x.

Original languageEnglish (US)
Title of host publicationMICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture
PublisherIEEE Computer Society
ISBN (Electronic)9781509035083
DOIs
StatePublished - Dec 14 2016
Event49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016 - Taipei, Taiwan, Province of China
Duration: Oct 15 2016Oct 19 2016

Publication series

NameProceedings of the Annual International Symposium on Microarchitecture, MICRO
Volume2016-December
ISSN (Print)1072-4451

Other

Other49th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2016
CountryTaiwan, Province of China
CityTaipei
Period10/15/1610/19/16

ASJC Scopus subject areas

  • Hardware and Architecture

Fingerprint Dive into the research topics of 'KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism'. Together they form a unique fingerprint.

  • Cite this

    Hajj, I. E., Gomez-Luna, J., Li, C., Chang, L. W., Milojicic, D., & Hwu, W. M. (2016). KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism. In MICRO 2016 - 49th Annual IEEE/ACM International Symposium on Microarchitecture [7783716] (Proceedings of the Annual International Symposium on Microarchitecture, MICRO; Vol. 2016-December). IEEE Computer Society. https://doi.org/10.1109/MICRO.2016.7783716