Exploiting more parallelism from applications having generalized reductions on GPU architectures

Xiao Long Wu, Nady Obeid, Wen-Mei W Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Reduction is a common component of many applications, but can often be the limiting factor for parallelization. Previous reduction work has focused on detecting reduction idioms and parallelizing the reduction operation by minimizing data communications or exploiting more data locality. While these techniques can be useful, they are mostly limited to simple code structures. In this paper, we propose a method for exploiting more parallelism by isolating the reduction from users of the intermediate results. The other main contribution of our work is enabling the parallelization of more complex reduction codes, including those that involve the use of intermediate reduction results. The proposed transformations are often implemented by programmers in an ad-hoc manner, but to the best of our knowledge no previous work has been proposed to automate these transformations for many-core architectures. We show that the automatic transformations can result in significant speedup compared to the original code using two benchmark applications.

Original languageEnglish (US)
Title of host publicationProceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010
Pages1175-1180
Number of pages6
DOIs
StatePublished - Nov 19 2010
Event10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010 - Bradford, United Kingdom
Duration: Jun 29 2010Jul 1 2010

Publication series

NameProceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010

Other

Other10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, 10th IEEE Int. Conf. Scalable Computing and Communications, ScalCom-2010
CountryUnited Kingdom
CityBradford
Period6/29/107/1/10

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Networks and Communications
  • Software

Fingerprint Dive into the research topics of 'Exploiting more parallelism from applications having generalized reductions on GPU architectures'. Together they form a unique fingerprint.

  • Cite this

    Wu, X. L., Obeid, N., & Hwu, W-M. W. (2010). Exploiting more parallelism from applications having generalized reductions on GPU architectures. In Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010 (pp. 1175-1180). [5577899] (Proceedings - 10th IEEE International Conference on Computer and Information Technology, CIT-2010, 7th IEEE International Conference on Embedded Software and Systems, ICESS-2010, ScalCom-2010). https://doi.org/10.1109/CIT.2010.213