Exploiting more parallelism from applications having generalized reductions on GPU architectures

Xiao Long Wu, Nady Obeid, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In this paper, we propose a method for exploiting more parallelism by isolating the reduction from users of the intermediate results. The other main contribution of our work is enabling the parallelization of more complex reduction codes, including those that involve the use of intermediate reduction results. The proposed transformations are often implemented by programmers in an ad-hoc manner, but to the best of our knowledge no previous work has been proposed to automate these transformations for many-core architectures. We show that the automatic transformations can result in significant speedup compared to the original code using two benchmark applications.

Original languageEnglish (US)
Title of host publicationProceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010
Pages1175-1180
Number of pages6
DOIs
StatePublished - Nov 19 2010
Event10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010 - Bradford, United Kingdom
Duration: Jun 29 2010Jul 1 2010

Publication series

NameProceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010

Other

Other10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010
CountryUnited Kingdom
CityBradford
Period6/29/107/1/10

Fingerprint

Graphics processing unit
Communication

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Software

Cite this

Wu, X. L., Obeid, N., & Hwu, W-M. W. (2010). Exploiting more parallelism from applications having generalized reductions on GPU architectures. In Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010 (pp. 1175-1180). [5577899] (Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010). https://doi.org/10.1109/CIT.2010.213

Exploiting more parallelism from applications having generalized reductions on GPU architectures. / Wu, Xiao Long; Obeid, Nady; Hwu, Wen-Mei W.

Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010. 2010. p. 1175-1180 5577899 (Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wu, XL, Obeid, N & Hwu, W-MW 2010, Exploiting more parallelism from applications having generalized reductions on GPU architectures. in Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010., 5577899, Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010, pp. 1175-1180, 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010, Bradford, United Kingdom, 6/29/10. https://doi.org/10.1109/CIT.2010.213
Wu XL, Obeid N, Hwu W-MW. Exploiting more parallelism from applications having generalized reductions on GPU architectures. In Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010. 2010. p. 1175-1180. 5577899. (Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010). https://doi.org/10.1109/CIT.2010.213
Wu, Xiao Long ; Obeid, Nady ; Hwu, Wen-Mei W. / Exploiting more parallelism from applications having generalized reductions on GPU architectures. Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010. 2010. pp. 1175-1180 (Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010).
@inproceedings{457f4ff711ec4b3c8468711de7116b93,
title = "Exploiting more parallelism from applications having generalized reductions on GPU architectures",
abstract = "Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In this paper, we propose a method for exploiting more parallelism by isolating the reduction from users of the intermediate results. The other main contribution of our work is enabling the parallelization of more complex reduction codes, including those that involve the use of intermediate reduction results. The proposed transformations are often implemented by programmers in an ad-hoc manner, but to the best of our knowledge no previous work has been proposed to automate these transformations for many-core architectures. We show that the automatic transformations can result in significant speedup compared to the original code using two benchmark applications.",
author = "Wu, {Xiao Long} and Nady Obeid and Hwu, {Wen-Mei W}",
year = "2010",
month = "11",
day = "19",
doi = "10.1109/CIT.2010.213",
language = "English (US)",
isbn = "9780769541082",
series = "Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010",
pages = "1175--1180",
booktitle = "Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010",

}

TY - GEN

T1 - Exploiting more parallelism from applications having generalized reductions on GPU architectures

AU - Wu, Xiao Long

AU - Obeid, Nady

AU - Hwu, Wen-Mei W

PY - 2010/11/19

Y1 - 2010/11/19

N2 - Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In this paper, we propose a method for exploiting more parallelism by isolating the reduction from users of the intermediate results. The other main contribution of our work is enabling the parallelization of more complex reduction codes, including those that involve the use of intermediate reduction results. The proposed transformations are often implemented by programmers in an ad-hoc manner, but to the best of our knowledge no previous work has been proposed to automate these transformations for many-core architectures. We show that the automatic transformations can result in significant speedup compared to the original code using two benchmark applications.

AB - Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In this paper, we propose a method for exploiting more parallelism by isolating the reduction from users of the intermediate results. The other main contribution of our work is enabling the parallelization of more complex reduction codes, including those that involve the use of intermediate reduction results. The proposed transformations are often implemented by programmers in an ad-hoc manner, but to the best of our knowledge no previous work has been proposed to automate these transformations for many-core architectures. We show that the automatic transformations can result in significant speedup compared to the original code using two benchmark applications.

UR - http://www.scopus.com/inward/record.url?scp=78249259656&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78249259656&partnerID=8YFLogxK

U2 - 10.1109/CIT.2010.213

DO - 10.1109/CIT.2010.213

M3 - Conference contribution

AN - SCOPUS:78249259656

SN - 9780769541082

T3 - Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010

SP - 1175

EP - 1180

BT - Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010

ER -