TY - GEN
T1 - Using the translation lookaside buffer to map threads in parallel applications based on shared memory
AU - Cruz, Eduardo H.M.
AU - Diener, Matthias
AU - Navaux, Philippe O.A.
PY - 2012
Y1 - 2012
N2 - The communication latency between the cores in multiprocessor architectures differs depending on the memory hierarchy and the interconnections. With the increase of the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables. For this reason, it is difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this paper, we introduce a new light-weight mechanism to detect the communication pattern of threads using the Translation Look aside Buffer (TLB). Our mechanism relies entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanism with the NAS Parallel Benchmarks (NPB) and achieved an accurate representation of the communication patterns. Using the detected communication patterns, we generated thread mappings using a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3%, reducing the number of cache misses by up to 31.1%.
AB - The communication latency between the cores in multiprocessor architectures differs depending on the memory hierarchy and the interconnections. With the increase of the number of cores per chip and the number of threads per core, this difference between the communication latencies is increasing. Therefore, it is important to map the threads of parallel applications taking into account the communication between them. In parallel applications based on the shared memory paradigm, the communication is implicit and occurs through accesses to shared variables. For this reason, it is difficult to detect the communication pattern between the threads. Traditional approaches use simulation to monitor the memory accesses performed by the application, requiring modifications to the source code and drastically increasing the overhead. In this paper, we introduce a new light-weight mechanism to detect the communication pattern of threads using the Translation Look aside Buffer (TLB). Our mechanism relies entirely on hardware features, which makes the thread mapping transparent to the programmer and allows it to be performed dynamically by the operating system. Moreover, no time consuming task, such as simulation, is required. We evaluated our mechanism with the NAS Parallel Benchmarks (NPB) and achieved an accurate representation of the communication patterns. Using the detected communication patterns, we generated thread mappings using a heuristic method based on the Edmonds graph matching algorithm. Running the applications with these mappings resulted in performance improvements of up to 15.3%, reducing the number of cache misses by up to 31.1%.
KW - Cache Misses
KW - Interconnections
KW - Parallel applications
KW - Shared memory
KW - Thread mapping
KW - TLB
KW - Translation Lookaside Buffer
UR - http://www.scopus.com/inward/record.url?scp=84866883243&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84866883243&partnerID=8YFLogxK
U2 - 10.1109/IPDPS.2012.56
DO - 10.1109/IPDPS.2012.56
M3 - Conference contribution
AN - SCOPUS:84866883243
SN - 9780769546759
T3 - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
SP - 532
EP - 543
BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
T2 - 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Y2 - 21 May 2012 through 25 May 2012
ER -