Speeding up Nek5000 with autotuning and specialization

Jaewook Shin, Mary W. Hall, Jacqueline Chame, Chun Chen, Paul F. Fischer, Paul D. Hovland

Research output: Chapter in Book/Report/Conference proceedingConference contribution


Autotuning technology has emerged recently as a systematic process for evaluating alternative implementations of a computation, in order to select the best-performing solution for a particular architecture. Specialization optimizes code customized to a particular class of input data set. In this paper, we demonstrate how compiler-based autotuning that incorporates specialization for expected data set sizes of key computations can be used to speed up Nek5000, a spectral-element code. Nek5000 makes heavy use of what are effectively Basic Linear Algebra Subroutine (BLAS) calls, but for very small matrices. Through autotuning and specialization, we can achieve significant performance gains over hand-tuned libraries (e.g., Goto, ATLAS, and ACML BLAS). Additional performance gains are obtained from using higher-level compiler optimizations that aggregate multiple BLAS calls. We demonstrate more than 2.2X performance gains on an Opteron over the original manually tuned implementation, and speedups of up to 1.26X on the entire application running on 256 nodes of the Cray XT5 Jaguar system at Oak Ridge.

Original languageEnglish (US)
Title of host publicationICS'10 - 2010 International Conference on Supercomputing
Number of pages10
StatePublished - 2010
Externally publishedYes
Event24th ACM International Conference on Supercomputing, ICS'10 - Tsukuba, Ibaraki, Japan
Duration: Jun 2 2010Jun 4 2010

Publication series

NameProceedings of the International Conference on Supercomputing


Other24th ACM International Conference on Supercomputing, ICS'10
CityTsukuba, Ibaraki


  • autotuning
  • empirical performance tuning
  • specialization

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Speeding up Nek5000 with autotuning and specialization'. Together they form a unique fingerprint.

Cite this