TY - GEN
T1 - kMAF
T2 - 23rd International Conference on Parallel Architectures and Compilation Techniques, PACT 2014
AU - Diener, Matthias
AU - Cruz, Eduardo H.M.
AU - Navaux, Philippe O.A.
AU - Busse, Anselm
AU - Heiß, Hans Ulrich
PY - 2014
Y1 - 2014
N2 - One of the main challenges for parallel architectures is the increasing complexity of the memory hierarchy, which consists of several levels of private and shared caches, as well as interconnections between separate memories in NUMA machines. To make full use of this hierarchy, it is necessary to improve the locality of memory accesses by reducing accesses to remote caches and memories, and using local ones instead. Two techniques can be used to increase the memory access locality: executing threads and processes that access shared data close to each other in the memory hierarchy (thread affinity), and placing the memory pages they access on the NUMA node they are executing on (data affinity). Most related work in this area focuses on either thread or data affinity, but not both, which limits the improvements. Other mechanisms require expensive operations, such as memory access traces or binary analysis, require changes to hardware or work only on specific parallel APIs. In this paper, we introduce kMAF, a mechanism that automatically manages thread and data affinity on the kernel level. The memory access behavior of the running application is determined during its execution by analyzing its page faults. This information is used by kMAF to migrate threads and memory pages, such that the overall memory access locality is optimized. Extensive evaluation with 27 benchmarks from 4 benchmark suites shows substantial performance improvements, with results close to an oracle mechanism. Execution time was reduced by up to 35.7% (13.8% on average), while energy efficiency was improved by up to 34.6% (9.3% on average).
AB - One of the main challenges for parallel architectures is the increasing complexity of the memory hierarchy, which consists of several levels of private and shared caches, as well as interconnections between separate memories in NUMA machines. To make full use of this hierarchy, it is necessary to improve the locality of memory accesses by reducing accesses to remote caches and memories, and using local ones instead. Two techniques can be used to increase the memory access locality: executing threads and processes that access shared data close to each other in the memory hierarchy (thread affinity), and placing the memory pages they access on the NUMA node they are executing on (data affinity). Most related work in this area focuses on either thread or data affinity, but not both, which limits the improvements. Other mechanisms require expensive operations, such as memory access traces or binary analysis, require changes to hardware or work only on specific parallel APIs. In this paper, we introduce kMAF, a mechanism that automatically manages thread and data affinity on the kernel level. The memory access behavior of the running application is determined during its execution by analyzing its page faults. This information is used by kMAF to migrate threads and memory pages, such that the overall memory access locality is optimized. Extensive evaluation with 27 benchmarks from 4 benchmark suites shows substantial performance improvements, with results close to an oracle mechanism. Execution time was reduced by up to 35.7% (13.8% on average), while energy efficiency was improved by up to 34.6% (9.3% on average).
KW - cache hierarchies
KW - data affinity
KW - numa
KW - thread affinity
UR - http://www.scopus.com/inward/record.url?scp=84907075702&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84907075702&partnerID=8YFLogxK
U2 - 10.1145/2628071.2628085
DO - 10.1145/2628071.2628085
M3 - Conference contribution
AN - SCOPUS:84907075702
SN - 9781450328098
T3 - Parallel Architectures and Compilation Techniques - Conference Proceedings, PACT
SP - 277
EP - 288
BT - PACT 2014 - Proceedings of the 23rd International Conference on Parallel Architectures and Compilation Techniques
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 24 August 2014 through 27 August 2014
ER -