TY - CHAP
T1 - Sharing-aware mapping and parallel architectures
AU - H. M. Cruz, Eduardo
AU - Diener, Matthias
AU - O. A. Navaux, Philippe
N1 - Publisher Copyright:
© The Author(s), under exclusive licence to Springer International Publishing AG, part of Springer Nature 2018.
PY - 2018
Y1 - 2018
N2 - The difference in the memory locality between the cores affects the data sharing performance. As parallel applications need to access shared data, a complex memory hierarchy presents challenges for mapping threads to cores, and data to NUMA nodes (Wang et al., Performance analysis of thread mappings with a holistic view of the hardware resources. In: IEEE International symposium on performance analysis of systems & software (ISPASS), 2012). Threads that access a large amount of shared data should be mapped to cores that are close to each other in the memory hierarchy, while data should be mapped to the NUMA node executing the threads that access them (Ribeiro et al., Memory affinity for hierarchical shared memory multiprocessors. In: International symposium on computer architecture and high performance computing (SBAC-PAD), pp 59–66, 2009). In this way, the locality of the memory accesses is improved, which leads to an increase of performance and energy efficiency. For optimal performance improvements, data and thread mapping should be performed together (Terboven et al., Data and thread affinity in OpenMP programs. In: Workshop on memory access on future processors: a solved problem? (MAW), pp 377–384, 2008).
AB - The difference in the memory locality between the cores affects the data sharing performance. As parallel applications need to access shared data, a complex memory hierarchy presents challenges for mapping threads to cores, and data to NUMA nodes (Wang et al., Performance analysis of thread mappings with a holistic view of the hardware resources. In: IEEE International symposium on performance analysis of systems & software (ISPASS), 2012). Threads that access a large amount of shared data should be mapped to cores that are close to each other in the memory hierarchy, while data should be mapped to the NUMA node executing the threads that access them (Ribeiro et al., Memory affinity for hierarchical shared memory multiprocessors. In: International symposium on computer architecture and high performance computing (SBAC-PAD), pp 59–66, 2009). In this way, the locality of the memory accesses is improved, which leads to an increase of performance and energy efficiency. For optimal performance improvements, data and thread mapping should be performed together (Terboven et al., Data and thread affinity in OpenMP programs. In: Workshop on memory access on future processors: a solved problem? (MAW), pp 377–384, 2008).
UR - http://www.scopus.com/inward/record.url?scp=85049782714&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85049782714&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-91074-1_2
DO - 10.1007/978-3-319-91074-1_2
M3 - Chapter
AN - SCOPUS:85049782714
T3 - SpringerBriefs in Computer Science
SP - 9
EP - 17
BT - SpringerBriefs in Computer Science
PB - Springer Nature
ER -