Characterizing communication and page usage of parallel applications for thread and data mapping

Matthias Diener, Eduardo H.M. Cruz, Laércio L. Pilla, Fabrice Dupros, Philippe O.A. Navaux

Research output: Contribution to journalArticlepeer-review


The parallelism in shared-memory systems has increased significantly with the advent and evolution of multicore processors. Current systems include several multicore and multithreaded processors with Non-Uniform Memory Access (NUMA) characteristics. These architectures require the adoption of two strategies for the efficient execution of parallel applications: (i) threads sharing data should be placed in such a way in the memory hierarchy that they execute on shared caches; and (ii) a thread should have the data that it accesses placed on the NUMA node where it is executing. We refer to these techniques as thread and data mapping, respectively. Both strategies require knowledge of the application's memory access behavior to identify the communication between threads and processes as well as their usage of memory pages. In this paper, we introduce a profiling method to establish the suitability of parallel applications for improved mappings that take the memory hierarchy into account, based on a mathematical description of their memory access behaviors. Experiments with a large set of parallel workloads that are based on a variety of parallel APIs (MPI, OpenMP, Pthreads, and MPI+OpenMP) show that most applications can benefit from improved mappings. We provide a mechanism to compute optimized thread and data mappings. Experimental results with this mechanism showed performance improvements of up to 54% (20% on average), as well as reductions of the energy consumption of up to 37% (11% on average), compared to the default mapping by the operating system. Furthermore, our results show that thread and data mapping have to be performed jointly in order to achieve optimal improvements.

Original languageEnglish (US)
Pages (from-to)18-36
Number of pages19
JournalPerformance Evaluation
StatePublished - Jun 1 2015
Externally publishedYes


  • Data mapping
  • Multicore
  • NUMA
  • Shared memory
  • Thread mapping

ASJC Scopus subject areas

  • Software
  • Modeling and Simulation
  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'Characterizing communication and page usage of parallel applications for thread and data mapping'. Together they form a unique fingerprint.

Cite this