BulkCompiler: High-performance sequential consistency through cooperative compiler and hardware support

W. Ahn, S. Qi, M. Nicolaides, J. Torrellas, J. W. Lee, X. Fang, S. Midkiff, David Wong

Research output: Contribution to journalConference article

Abstract

A platform that supported Sequential Consistency (SC) for all codes - - not only the well-synchronized ones - - would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well. This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance SC platform. We introduce ISA primitives and software algorithms for BulkCompiler to drive instruction-group formation, and to transform code to exploit the groups. Our simulation results show that BulkCompiler not only enables a whole-system SC environment, but also one that actually outperforms a conventional platform that uses the more relaxed Java Memory Model by an average of 37%. The speedups come from code optimization inside software-assembled instruction groups.

Original languageEnglish (US)
Pages (from-to)133-144
Number of pages12
JournalProceedings of the Annual International Symposium on Microarchitecture, MICRO
DOIs
StatePublished - Dec 1 2009
Event42nd Annual IEEE/ACM International Symposium on Microarchitecture, Micro-42 - New York, NY, United States
Duration: Dec 12 2009Dec 16 2009

Fingerprint

Hardware
Data storage equipment

Keywords

  • Atomic region
  • Chunk-based architecture
  • Compiler optimization
  • Sequential consistency

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

BulkCompiler : High-performance sequential consistency through cooperative compiler and hardware support. / Ahn, W.; Qi, S.; Nicolaides, M.; Torrellas, J.; Lee, J. W.; Fang, X.; Midkiff, S.; Wong, David.

In: Proceedings of the Annual International Symposium on Microarchitecture, MICRO, 01.12.2009, p. 133-144.

Research output: Contribution to journalConference article

@article{c4c710a786a043ac9607140961c7e847,
title = "BulkCompiler: High-performance sequential consistency through cooperative compiler and hardware support",
abstract = "A platform that supported Sequential Consistency (SC) for all codes - - not only the well-synchronized ones - - would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well. This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance SC platform. We introduce ISA primitives and software algorithms for BulkCompiler to drive instruction-group formation, and to transform code to exploit the groups. Our simulation results show that BulkCompiler not only enables a whole-system SC environment, but also one that actually outperforms a conventional platform that uses the more relaxed Java Memory Model by an average of 37{\%}. The speedups come from code optimization inside software-assembled instruction groups.",
keywords = "Atomic region, Chunk-based architecture, Compiler optimization, Sequential consistency",
author = "W. Ahn and S. Qi and M. Nicolaides and J. Torrellas and Lee, {J. W.} and X. Fang and S. Midkiff and David Wong",
year = "2009",
month = "12",
day = "1",
doi = "10.1145/1669112.1669131",
language = "English (US)",
pages = "133--144",
journal = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",
issn = "1072-4451",

}

TY - JOUR

T1 - BulkCompiler

T2 - High-performance sequential consistency through cooperative compiler and hardware support

AU - Ahn, W.

AU - Qi, S.

AU - Nicolaides, M.

AU - Torrellas, J.

AU - Lee, J. W.

AU - Fang, X.

AU - Midkiff, S.

AU - Wong, David

PY - 2009/12/1

Y1 - 2009/12/1

N2 - A platform that supported Sequential Consistency (SC) for all codes - - not only the well-synchronized ones - - would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well. This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance SC platform. We introduce ISA primitives and software algorithms for BulkCompiler to drive instruction-group formation, and to transform code to exploit the groups. Our simulation results show that BulkCompiler not only enables a whole-system SC environment, but also one that actually outperforms a conventional platform that uses the more relaxed Java Memory Model by an average of 37%. The speedups come from code optimization inside software-assembled instruction groups.

AB - A platform that supported Sequential Consistency (SC) for all codes - - not only the well-synchronized ones - - would simplify the task of programmers. Recently, several hardware architectures that support high-performance SC by committing groups of instructions at a time have been proposed. However, for a platform to support SC, it is insufficient that the hardware does; the compiler has to support SC as well. This paper presents the hardware-compiler interface, and the main compiler ideas for BulkCompiler, a simple compiler layer that works with the group-committing hardware to provide a whole-system high-performance SC platform. We introduce ISA primitives and software algorithms for BulkCompiler to drive instruction-group formation, and to transform code to exploit the groups. Our simulation results show that BulkCompiler not only enables a whole-system SC environment, but also one that actually outperforms a conventional platform that uses the more relaxed Java Memory Model by an average of 37%. The speedups come from code optimization inside software-assembled instruction groups.

KW - Atomic region

KW - Chunk-based architecture

KW - Compiler optimization

KW - Sequential consistency

UR - http://www.scopus.com/inward/record.url?scp=76749165809&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=76749165809&partnerID=8YFLogxK

U2 - 10.1145/1669112.1669131

DO - 10.1145/1669112.1669131

M3 - Conference article

AN - SCOPUS:76749165809

SP - 133

EP - 144

JO - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

JF - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

SN - 1072-4451

ER -