TY - GEN
T1 - A sharing-aware memory management unit for online mapping in multi-core architectures
AU - Cruz, Eduardo H.M.
AU - Diener, Matthias
AU - Pilla, Laércio L.
AU - Navaux, Philippe O.A.
N1 - Funding Information:
This research received funding from the EU H2020 Programme and from MCTI/RNP-Brazil under the HPC4E project, grant agreement n. 689772. This work was also supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research project grant 99999.007556/2015-02. Additional funding was provided by CNPq and Capes. o
Publisher Copyright:
© Springer International Publishing Switzerland 2016.
PY - 2016
Y1 - 2016
N2 - In modern shared-memory architectures, it is important to map threads and data in a way that increases the locality of their memory accesses, thereby improving performance and energy efficiency. Threads that access shared data should be mapped close to each other in the memory hierarchy, while the data they access should be mapped to their NUMA node, which is called sharing-aware mapping. In this paper, we propose SAMMU, which adds sharing-awareness to the memory management unit in current architectures. SAMMU analyzes the memory access behavior in hardware and provides information to the operating system so it can perform an online mapping of threads and data. In the evaluation with a wide range of parallel applications, performance was improved by up to 35.7% (13.1% on average).
AB - In modern shared-memory architectures, it is important to map threads and data in a way that increases the locality of their memory accesses, thereby improving performance and energy efficiency. Threads that access shared data should be mapped close to each other in the memory hierarchy, while the data they access should be mapped to their NUMA node, which is called sharing-aware mapping. In this paper, we propose SAMMU, which adds sharing-awareness to the memory management unit in current architectures. SAMMU analyzes the memory access behavior in hardware and provides information to the operating system so it can perform an online mapping of threads and data. In the evaluation with a wide range of parallel applications, performance was improved by up to 35.7% (13.1% on average).
UR - http://www.scopus.com/inward/record.url?scp=84984819075&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84984819075&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-43659-3_36
DO - 10.1007/978-3-319-43659-3_36
M3 - Conference contribution
AN - SCOPUS:84984819075
SN - 9783319436586
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 490
EP - 501
BT - Parallel Processing - 22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016, Proceedings
A2 - Dutot, Pierre-François
A2 - Trystram, Denis
PB - Springer
T2 - 22nd International Conference on Parallel and Distributed Computing, Euro-Par 2016
Y2 - 24 August 2016 through 26 August 2016
ER -