Abstract
Most performance enhancing mechanisms in current processors, such as branch predictors or prefetchers, rely on program characteristics monitored at the granularity of single instructions. However, many of these characteristics can be obtained at the basic block-level instead. The coarser granularity allows a larger portion of the code to be examined, enabling a more accurate profiling and a detailed analysis of the different types of instructions executed within a block. Therefore, block-level analysis can be advantageous for performance enhancing mechanisms, as it allows us to look at how the instructions influence each other, and thus detect complex behavior patterns. In this paper, we present the Dynamic Block-Level Execution Profiler (DBLEP), a basic block level online mechanism that profiles micro-architectural bottlenecks, such as delinquent memory loads, hard-to-predict branches and contention for functional units. DBLEP operates at the basic block level and provides information that can be used to reduce the impact of these bottlenecks. A prefetch dropping scheme and a memory controller policy were developed to use the code profiling information provided by DBLEP. By taking advantage of the high profiling accuracy, these mechanisms are able to improve the processor's performance by up to 18.6% (5.3% on average). We show that our mechanism's performance is comparable to mechanisms that work on single instruction granularity, using less hardware.
Original language | English (US) |
---|---|
Pages (from-to) | 15-28 |
Number of pages | 14 |
Journal | Parallel Computing |
Volume | 54 |
DOIs | |
State | Published - May 1 2016 |
Externally published | Yes |
Keywords
- Basic Block Profiling
- Computer Architecture
- HPC
- Memory Hierarchy Performance
- Processor Design
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Hardware and Architecture
- Computer Networks and Communications
- Computer Graphics and Computer-Aided Design
- Artificial Intelligence