TY - GEN
T1 - Memory Performance and Bottlenecks in Multicore and GPU Architectures
AU - Serpa, Matheus S.
AU - Moreira, Francis B.
AU - Navaux, Philippe O.A.
AU - Cruz, Eduardo H.M.
AU - Diener, Matthias
AU - Griebler, Dalvan
AU - Fernandes, Luiz Gustavo
N1 - Funding Information:
The authors would like to thank the University of C?rdoba, the Laboratory of Toxicology and Environmental Management; COLCIENCIAS, for financing the project: ?Spatial distribution of heavy metals and nutrients in flooded soils of the Mojana region: Environmental implications and recovery strategies?, identified with the code 1112-569-35214 and contract number 0211-2013. We appreciate the anonymous reviewers and the Associate Editor for their valuable comments and suggestions to improve the quality of the manuscript.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/3/19
Y1 - 2019/3/19
N2 - Nowadays, there are several different architectures available not only for the industry, but also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the Sunway SW26010, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. Therefore, the same application can perform well when executing on one architecture, but poorly on another architecture. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. The related work in this area mostly focuses on a limited analysis encompassing execution time and energy. In this paper, we perform a detailed investigation on the impact of the memory subsystem of different architectures, which is one of the most important aspects to be considered. For this study, we performed experiments in the Broadwell CPU and Pascal GPU, using applications from the Rodinia benchmark suite. In this way, we were able to understand why an application performs well on one architecture and poorly on others.
AB - Nowadays, there are several different architectures available not only for the industry, but also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the Sunway SW26010, or even energy efficiency-driven processors such as the ARM family, present very different architectural characteristics. This wide range of characteristics presents a challenge for the developers of applications. Developers must deal with different instruction sets, memory hierarchies, or even different programming paradigms when programming for these architectures. Therefore, the same application can perform well when executing on one architecture, but poorly on another architecture. To optimize an application, it is important to have a deep understanding of how it behaves on different architectures. The related work in this area mostly focuses on a limited analysis encompassing execution time and energy. In this paper, we perform a detailed investigation on the impact of the memory subsystem of different architectures, which is one of the most important aspects to be considered. For this study, we performed experiments in the Broadwell CPU and Pascal GPU, using applications from the Rodinia benchmark suite. In this way, we were able to understand why an application performs well on one architecture and poorly on others.
KW - Cache memory
KW - HPC
KW - Manycore systems
KW - Memory subsystem
KW - Performance evaluation
UR - http://www.scopus.com/inward/record.url?scp=85063877722&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85063877722&partnerID=8YFLogxK
U2 - 10.1109/EMPDP.2019.8671628
DO - 10.1109/EMPDP.2019.8671628
M3 - Conference contribution
AN - SCOPUS:85063877722
T3 - Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019
SP - 233
EP - 236
BT - Proceedings - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 27th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2019
Y2 - 13 February 2019 through 15 February 2019
ER -