TY - GEN
T1 - Profiling and reducing micro-architecture bottlenecks at the hardware level
AU - Moreira, Francis B.
AU - Alves, Marco A.Z.
AU - Diener, Matthias
AU - Navaux, Philippe O.A.
AU - Koren, Israel
N1 - Publisher Copyright:
© 2014 IEEE.
PY - 2014/12/1
Y1 - 2014/12/1
N2 - Most mechanisms in current superscalar processors use instruction granularity information for speculation, such as branch predictors or prefetchers. However, many of these characteristics can be obtained at the basic block level, increasing the amount of code that can be covered while requiring less space to store the data. Furthermore, the code can be profiled more accurately and provide a higher variety of information by analyzing different instruction types inside a block. Because of these advantages, block-level analysis can offer more opportunities for mechanisms that use this information. For example, it is possible to integrate information about branch prediction and memory accesses to provide precise information for speculative mechanisms, increasing accuracy and performance. We propose a Block-Level Architecture Profiler (BLAP), an online mechanism that profiles bottlenecks at the micro architectural level, such as delinquent memory loads, hard-topredict branches and contention for functional units. BLAP works at the basic block level, providing information that can be used to reduce the impact of these bottlenecks. A prefetch dropping mechanism and a memory controller policy were developed to use the profiled information provided by BLAP. Together, these mechanisms are able to improve performance by up to 17.39% (3.90% on average). Our technique showed average gains of 13.14% when evaluated under high memory pressure due to highly aggressive prefetch.
AB - Most mechanisms in current superscalar processors use instruction granularity information for speculation, such as branch predictors or prefetchers. However, many of these characteristics can be obtained at the basic block level, increasing the amount of code that can be covered while requiring less space to store the data. Furthermore, the code can be profiled more accurately and provide a higher variety of information by analyzing different instruction types inside a block. Because of these advantages, block-level analysis can offer more opportunities for mechanisms that use this information. For example, it is possible to integrate information about branch prediction and memory accesses to provide precise information for speculative mechanisms, increasing accuracy and performance. We propose a Block-Level Architecture Profiler (BLAP), an online mechanism that profiles bottlenecks at the micro architectural level, such as delinquent memory loads, hard-topredict branches and contention for functional units. BLAP works at the basic block level, providing information that can be used to reduce the impact of these bottlenecks. A prefetch dropping mechanism and a memory controller policy were developed to use the profiled information provided by BLAP. Together, these mechanisms are able to improve performance by up to 17.39% (3.90% on average). Our technique showed average gains of 13.14% when evaluated under high memory pressure due to highly aggressive prefetch.
UR - http://www.scopus.com/inward/record.url?scp=84919445358&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84919445358&partnerID=8YFLogxK
U2 - 10.1109/SBAC-PAD.2014.19
DO - 10.1109/SBAC-PAD.2014.19
M3 - Conference contribution
AN - SCOPUS:84919445358
T3 - Proceedings - Symposium on Computer Architecture and High Performance Computing
SP - 222
EP - 229
BT - Proceedings - IEEE 26th International Symposium
PB - IEEE Computer Society
T2 - 26th International Symposium on Computer Architecture and High Performance Computing, SBAC-PAD 2014
Y2 - 22 October 2014 through 24 October 2014
ER -