Porting optimized GPU kernels to a multi-core CPU: Computational quantum chemistry application example

Dong Ye, Alexey Titov, Volodymyr Kindratenko, Ivan Ufimtsev, Todd Martinez

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We investigate techniques for optimizing a multicore CPU code backported from a highly optimized GPU kernel. We show that common sub-expression elimination and loop unrolling optimization techniques improve code performance on the GPU, but not on the CPU. On the other hand, register reuse and loop merging are effective on the CPU and in combination they improve performance of the ported code by 16%.

Original languageEnglish (US)
Title of host publicationProceedings - 2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011
Pages72-75
Number of pages4
DOIs
StatePublished - 2011
Event2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011 - Knoxville, TN, United States
Duration: Jul 19 2011Jul 20 2011

Publication series

NameProceedings - 2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011

Other

Other2011 Symposium on Application Accelerators in High-Performance Computing, SAAHPC 2011
Country/TerritoryUnited States
CityKnoxville, TN
Period7/19/117/20/11

Keywords

  • Common sub-expression elimination
  • GPU
  • Loop merging
  • OpenMP
  • Register reuse
  • Unrolling

ASJC Scopus subject areas

  • Computer Science Applications

Fingerprint

Dive into the research topics of 'Porting optimized GPU kernels to a multi-core CPU: Computational quantum chemistry application example'. Together they form a unique fingerprint.

Cite this