TY - JOUR
T1 - Online Thread and Data Mapping Using a Sharing-Aware Memory Management Unit
AU - Cruz, Eduardo H.M.
AU - Diener, Matthias
AU - Pilla, Laércio L.
AU - Navaux, Philippe O.A.
N1 - Funding Information:
This research received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under the HPC4E project, grant agreement no. 689772. It was also supported by the Coordination for the Improvement of Higher Education Personnel (CAPES), the National Council for Scientific and Technological Development (CNPq), and Intel. Authors’ addresses: E. H. M. Cruz, IFPR Campus Paranavaí, Rua José Felipe Tequinha, 1400 - Jardim das Nações, CEP 87703-536, Paranavaí, PR, Brazil; email: [email protected]; M. Diener, 4020 National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA; email: [email protected]; L. L. Pilla, Laboratoire de Recherche en Informatique, Bât 650 Ada Lovelace, Université Paris Saclay, 91405 Orsay Cedex, France; email: [email protected]; P. O. A. Navaux, Instituto de Informática - UFRGS, Caixa postal 15064, Av. Bento Gonçalves, 9500, CEP 91501-970, Porto Alegre, RS, Brazil; email: [email protected]. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. © 2021 Copyright held by the owner/author(s). Publication rights licensed to ACM. 2376-3639/2021/01-ART16 $15.00 https://doi.org/10.1145/3433687
Publisher Copyright:
© 2021 ACM.
PY - 2021/3
Y1 - 2021/3
N2 - Current and future architectures rely on thread-level parallelism to sustain performance growth. These architectures have introduced a complex memory hierarchy, consisting of several cores organized hierarchically with multiple cache levels and NUMA nodes. These memory hierarchies can have an impact on the performance and energy efficiency of parallel applications as the importance of memory access locality is increased. In order to improve locality, the analysis of the memory access behavior of parallel applications is critical for mapping threads and data. Nevertheless, most previous work relies on indirect information about the memory accesses, or does not combine thread and data mapping, resulting in less accurate mappings. In this paper, we propose the Sharing-Aware Memory Management Unit (SAMMU), an extension to the memory management unit that allows it to detect the memory access behavior in hardware. With this information, the operating system can perform online mapping without any previous knowledge about the behavior of the application. In the evaluation with a wide range of parallel applications (NAS Parallel Benchmarks and PARSEC Benchmark Suite), performance was improved by up to 35.7% (10.0% on average) and energy efficiency was improved by up to 11.9% (4.1% on average). These improvements happened due to a substantial reduction of cache misses and interconnection traffic.
AB - Current and future architectures rely on thread-level parallelism to sustain performance growth. These architectures have introduced a complex memory hierarchy, consisting of several cores organized hierarchically with multiple cache levels and NUMA nodes. These memory hierarchies can have an impact on the performance and energy efficiency of parallel applications as the importance of memory access locality is increased. In order to improve locality, the analysis of the memory access behavior of parallel applications is critical for mapping threads and data. Nevertheless, most previous work relies on indirect information about the memory accesses, or does not combine thread and data mapping, resulting in less accurate mappings. In this paper, we propose the Sharing-Aware Memory Management Unit (SAMMU), an extension to the memory management unit that allows it to detect the memory access behavior in hardware. With this information, the operating system can perform online mapping without any previous knowledge about the behavior of the application. In the evaluation with a wide range of parallel applications (NAS Parallel Benchmarks and PARSEC Benchmark Suite), performance was improved by up to 35.7% (10.0% on average) and energy efficiency was improved by up to 11.9% (4.1% on average). These improvements happened due to a substantial reduction of cache misses and interconnection traffic.
KW - cache memory
KW - communication
KW - data mapping
KW - data sharing
KW - memory management unit
KW - NUMA
KW - shared memory
KW - Thread mapping
UR - http://www.scopus.com/inward/record.url?scp=85102895257&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85102895257&partnerID=8YFLogxK
U2 - 10.1145/3433687
DO - 10.1145/3433687
M3 - Article
AN - SCOPUS:85102895257
SN - 2376-3639
VL - 5
JO - ACM Transactions on Modeling and Performance Evaluation of Computing Systems
JF - ACM Transactions on Modeling and Performance Evaluation of Computing Systems
IS - 4
M1 - 16
ER -