Tasking with out-of-order spawn in TLS chip multiprocessors: Microarchitecture and compilation

Jose Renau, James Tuck, Wei Liu, Luis Ceze, Karin Strauss, Josep Torrellas

Research output: Contribution to conferencePaper

Abstract

Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism, including any nesting levels of both subroutines and loop iterations. Unfortunately, these environments are hard to support in decentralized CMP hardware: since tasks are spawned out-of-order and unpredictably, maintaining key TLS basics such as task ordering and efficient resource allocation is challenging. While the concept of out-of-order spawning is not new, this paper is the first to propose a set of microarchitectural mechanisms that, altogether, fundamentally enable fast TLS with out-of-order spawn in a CMP. Moreover, we develop a fully-automated TLS compiler for aggressive out-of-order spawn. With our mechanisms, a TLS CMP with four 4-issue cores achieves an average speedup of 1.30 for full SPECint 2000 applications; the corresponding speedup for in-order- only spawn is 1.04. Overall, our mechanisms unlock the potential of TLS for the toughest applications.

Original languageEnglish (US)
Pages179-188
Number of pages10
DOIs
StatePublished - Dec 1 2005
EventICS05 - 19th ACM International Conference on Supercomputing - Cambridge, MA, United States
Duration: Jun 20 2005Jun 22 2005

Other

OtherICS05 - 19th ACM International Conference on Supercomputing
CountryUnited States
CityCambridge, MA
Period6/20/056/22/05

Fingerprint

Subroutines
Resource allocation
Hardware

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Renau, J., Tuck, J., Liu, W., Ceze, L., Strauss, K., & Torrellas, J. (2005). Tasking with out-of-order spawn in TLS chip multiprocessors: Microarchitecture and compilation. 179-188. Paper presented at ICS05 - 19th ACM International Conference on Supercomputing, Cambridge, MA, United States. https://doi.org/10.1145/1088149.1088173

Tasking with out-of-order spawn in TLS chip multiprocessors : Microarchitecture and compilation. / Renau, Jose; Tuck, James; Liu, Wei; Ceze, Luis; Strauss, Karin; Torrellas, Josep.

2005. 179-188 Paper presented at ICS05 - 19th ACM International Conference on Supercomputing, Cambridge, MA, United States.

Research output: Contribution to conferencePaper

Renau, J, Tuck, J, Liu, W, Ceze, L, Strauss, K & Torrellas, J 2005, 'Tasking with out-of-order spawn in TLS chip multiprocessors: Microarchitecture and compilation', Paper presented at ICS05 - 19th ACM International Conference on Supercomputing, Cambridge, MA, United States, 6/20/05 - 6/22/05 pp. 179-188. https://doi.org/10.1145/1088149.1088173
Renau J, Tuck J, Liu W, Ceze L, Strauss K, Torrellas J. Tasking with out-of-order spawn in TLS chip multiprocessors: Microarchitecture and compilation. 2005. Paper presented at ICS05 - 19th ACM International Conference on Supercomputing, Cambridge, MA, United States. https://doi.org/10.1145/1088149.1088173
Renau, Jose ; Tuck, James ; Liu, Wei ; Ceze, Luis ; Strauss, Karin ; Torrellas, Josep. / Tasking with out-of-order spawn in TLS chip multiprocessors : Microarchitecture and compilation. Paper presented at ICS05 - 19th ACM International Conference on Supercomputing, Cambridge, MA, United States.10 p.
@conference{ef568335ff6f4376b921059d266a76cf,
title = "Tasking with out-of-order spawn in TLS chip multiprocessors: Microarchitecture and compilation",
abstract = "Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism, including any nesting levels of both subroutines and loop iterations. Unfortunately, these environments are hard to support in decentralized CMP hardware: since tasks are spawned out-of-order and unpredictably, maintaining key TLS basics such as task ordering and efficient resource allocation is challenging. While the concept of out-of-order spawning is not new, this paper is the first to propose a set of microarchitectural mechanisms that, altogether, fundamentally enable fast TLS with out-of-order spawn in a CMP. Moreover, we develop a fully-automated TLS compiler for aggressive out-of-order spawn. With our mechanisms, a TLS CMP with four 4-issue cores achieves an average speedup of 1.30 for full SPECint 2000 applications; the corresponding speedup for in-order- only spawn is 1.04. Overall, our mechanisms unlock the potential of TLS for the toughest applications.",
author = "Jose Renau and James Tuck and Wei Liu and Luis Ceze and Karin Strauss and Josep Torrellas",
year = "2005",
month = "12",
day = "1",
doi = "10.1145/1088149.1088173",
language = "English (US)",
pages = "179--188",
note = "ICS05 - 19th ACM International Conference on Supercomputing ; Conference date: 20-06-2005 Through 22-06-2005",

}

TY - CONF

T1 - Tasking with out-of-order spawn in TLS chip multiprocessors

T2 - Microarchitecture and compilation

AU - Renau, Jose

AU - Tuck, James

AU - Liu, Wei

AU - Ceze, Luis

AU - Strauss, Karin

AU - Torrellas, Josep

PY - 2005/12/1

Y1 - 2005/12/1

N2 - Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism, including any nesting levels of both subroutines and loop iterations. Unfortunately, these environments are hard to support in decentralized CMP hardware: since tasks are spawned out-of-order and unpredictably, maintaining key TLS basics such as task ordering and efficient resource allocation is challenging. While the concept of out-of-order spawning is not new, this paper is the first to propose a set of microarchitectural mechanisms that, altogether, fundamentally enable fast TLS with out-of-order spawn in a CMP. Moreover, we develop a fully-automated TLS compiler for aggressive out-of-order spawn. With our mechanisms, a TLS CMP with four 4-issue cores achieves an average speedup of 1.30 for full SPECint 2000 applications; the corresponding speedup for in-order- only spawn is 1.04. Overall, our mechanisms unlock the potential of TLS for the toughest applications.

AB - Chip Multiprocessors (CMPs) are flexible, high-frequency platforms on which to support Thread-Level Speculation (TLS). However, for TLS to deliver on its promise, CMPs must exploit multiple sources of speculative task-level parallelism, including any nesting levels of both subroutines and loop iterations. Unfortunately, these environments are hard to support in decentralized CMP hardware: since tasks are spawned out-of-order and unpredictably, maintaining key TLS basics such as task ordering and efficient resource allocation is challenging. While the concept of out-of-order spawning is not new, this paper is the first to propose a set of microarchitectural mechanisms that, altogether, fundamentally enable fast TLS with out-of-order spawn in a CMP. Moreover, we develop a fully-automated TLS compiler for aggressive out-of-order spawn. With our mechanisms, a TLS CMP with four 4-issue cores achieves an average speedup of 1.30 for full SPECint 2000 applications; the corresponding speedup for in-order- only spawn is 1.04. Overall, our mechanisms unlock the potential of TLS for the toughest applications.

UR - http://www.scopus.com/inward/record.url?scp=32844465384&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32844465384&partnerID=8YFLogxK

U2 - 10.1145/1088149.1088173

DO - 10.1145/1088149.1088173

M3 - Paper

AN - SCOPUS:32844465384

SP - 179

EP - 188

ER -