Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures

Hee Seok Kim, Izzat El Hajj, John Stratton, Steven Lumetta, Wen Mei Hwu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With heterogeneous computing on the rise, executing programs efficiently on different devices from a single source code has become increasingly important. OpenCL, having a bulk-synchronous programming model, has been proposed as a framework for writing such performance-portable programs. Execution order of work-items in a program is unconstrained except at barrier synchronization events, giving some freedom to an implementation when scheduling work-items between synchronization points. Many OpenCL (and CUDA) compilers have been designed for targeting multicore CPU architectures. However, scheduling work-items in prior work has been done with primary focus on correctness and vectorization. To the best of our knowledge, no existing implementations consider the impact of work-item scheduling on data locality. We propose an OpenCL compiler that performs data-locality-centric work-item scheduling. By analyzing the memory addresses accessed in loops within a kernel, our technique can make better decisions on how to schedule work-items to construct better memory access patterns, thereby improving performance. Our approach achieves geomean speedups of 3.32× over AMD's and 1.71 × over Intel's implementations on Parboil and Rodinia benchmarks.

Original languageEnglish (US)
Title of host publicationProceedings of the 2015 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages257-268
Number of pages12
ISBN (Electronic)9781479981618
DOIs
StatePublished - Mar 3 2015
Event2015 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015 - San Francisco, United States
Duration: Feb 7 2015Feb 11 2015

Publication series

NameProceedings of the 2015 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015

Other

Other2015 IEEE/ACM International Symposium on Code Generation and Optimization, CGO 2015
CountryUnited States
CitySan Francisco
Period2/7/152/11/15

ASJC Scopus subject areas

  • Applied Mathematics
  • Control and Optimization
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint Dive into the research topics of 'Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures'. Together they form a unique fingerprint.

Cite this