Toward a cost-effective DSM organization that exploits processor-memory integration

Josep Torrellas, Liuxi Yang, Anthony Trung Nguyen

Research output: Contribution to conferencePaper

Abstract

Dramatic increases in the number of transistors that can be integrated on a VLSI chip will soon allow commodity microprocessors to include both processor and a sizable fraction of main memory on chip. Distributed Shared-Memory (DSM) multiprocessors typically use the latest off-the-shelf microprocessors and thus will be affected by the upcoming processor-memory integration. In this paper, we explore how a cache-coherent DSM machine built around Processor-In-Memory (PIM) chips might be cost-effectively organized. To take advantage of the close coupling between processor and memory, we propose tagging the memory and organizing it as a cache. Furthermore, commercial considerations dictate the use of off-the-shelf hardware largely designed for uniprocessors. Consequently, we keep the directory control off-chip. To keep the multiprocessor cheap and simple, and to allow for reconfigurability, directory control is performed by chips that are identical to the ones used as compute nodes. As a result, the machine hardware can be easily reconfigured for computing or coherence-handling depending on the needs of the application. We also propose a cache coherence protocol that is tailored to our architecture: it uses the memory very efficiently while exploiting the large caching space available. Overall, the resulting machine is simple and inexpensive, and delivers performance that is comparable to, and higher than, the more expensive traditional COMA and CC-NUMA organizations, respectively.

Original languageEnglish (US)
Pages15-25
Number of pages11
StatePublished - Dec 1 1999
EventThe 6th International Symposium on High-Performance Computer Architecture (HPCA-6) - Toulouse, France
Duration: Jan 8 2000Jan 12 2000

Other

OtherThe 6th International Symposium on High-Performance Computer Architecture (HPCA-6)
CityToulouse, France
Period1/8/001/12/00

Fingerprint

Data storage equipment
Costs
Microprocessor chips
Hardware
Transistors

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Torrellas, J., Yang, L., & Nguyen, A. T. (1999). Toward a cost-effective DSM organization that exploits processor-memory integration. 15-25. Paper presented at The 6th International Symposium on High-Performance Computer Architecture (HPCA-6), Toulouse, France, .

Toward a cost-effective DSM organization that exploits processor-memory integration. / Torrellas, Josep; Yang, Liuxi; Nguyen, Anthony Trung.

1999. 15-25 Paper presented at The 6th International Symposium on High-Performance Computer Architecture (HPCA-6), Toulouse, France, .

Research output: Contribution to conferencePaper

Torrellas, J, Yang, L & Nguyen, AT 1999, 'Toward a cost-effective DSM organization that exploits processor-memory integration', Paper presented at The 6th International Symposium on High-Performance Computer Architecture (HPCA-6), Toulouse, France, 1/8/00 - 1/12/00 pp. 15-25.
Torrellas J, Yang L, Nguyen AT. Toward a cost-effective DSM organization that exploits processor-memory integration. 1999. Paper presented at The 6th International Symposium on High-Performance Computer Architecture (HPCA-6), Toulouse, France, .
Torrellas, Josep ; Yang, Liuxi ; Nguyen, Anthony Trung. / Toward a cost-effective DSM organization that exploits processor-memory integration. Paper presented at The 6th International Symposium on High-Performance Computer Architecture (HPCA-6), Toulouse, France, .11 p.
@conference{fe77c8bf78274eefb01c76a6d0f62a36,
title = "Toward a cost-effective DSM organization that exploits processor-memory integration",
abstract = "Dramatic increases in the number of transistors that can be integrated on a VLSI chip will soon allow commodity microprocessors to include both processor and a sizable fraction of main memory on chip. Distributed Shared-Memory (DSM) multiprocessors typically use the latest off-the-shelf microprocessors and thus will be affected by the upcoming processor-memory integration. In this paper, we explore how a cache-coherent DSM machine built around Processor-In-Memory (PIM) chips might be cost-effectively organized. To take advantage of the close coupling between processor and memory, we propose tagging the memory and organizing it as a cache. Furthermore, commercial considerations dictate the use of off-the-shelf hardware largely designed for uniprocessors. Consequently, we keep the directory control off-chip. To keep the multiprocessor cheap and simple, and to allow for reconfigurability, directory control is performed by chips that are identical to the ones used as compute nodes. As a result, the machine hardware can be easily reconfigured for computing or coherence-handling depending on the needs of the application. We also propose a cache coherence protocol that is tailored to our architecture: it uses the memory very efficiently while exploiting the large caching space available. Overall, the resulting machine is simple and inexpensive, and delivers performance that is comparable to, and higher than, the more expensive traditional COMA and CC-NUMA organizations, respectively.",
author = "Josep Torrellas and Liuxi Yang and Nguyen, {Anthony Trung}",
year = "1999",
month = "12",
day = "1",
language = "English (US)",
pages = "15--25",
note = "The 6th International Symposium on High-Performance Computer Architecture (HPCA-6) ; Conference date: 08-01-2000 Through 12-01-2000",

}

TY - CONF

T1 - Toward a cost-effective DSM organization that exploits processor-memory integration

AU - Torrellas, Josep

AU - Yang, Liuxi

AU - Nguyen, Anthony Trung

PY - 1999/12/1

Y1 - 1999/12/1

N2 - Dramatic increases in the number of transistors that can be integrated on a VLSI chip will soon allow commodity microprocessors to include both processor and a sizable fraction of main memory on chip. Distributed Shared-Memory (DSM) multiprocessors typically use the latest off-the-shelf microprocessors and thus will be affected by the upcoming processor-memory integration. In this paper, we explore how a cache-coherent DSM machine built around Processor-In-Memory (PIM) chips might be cost-effectively organized. To take advantage of the close coupling between processor and memory, we propose tagging the memory and organizing it as a cache. Furthermore, commercial considerations dictate the use of off-the-shelf hardware largely designed for uniprocessors. Consequently, we keep the directory control off-chip. To keep the multiprocessor cheap and simple, and to allow for reconfigurability, directory control is performed by chips that are identical to the ones used as compute nodes. As a result, the machine hardware can be easily reconfigured for computing or coherence-handling depending on the needs of the application. We also propose a cache coherence protocol that is tailored to our architecture: it uses the memory very efficiently while exploiting the large caching space available. Overall, the resulting machine is simple and inexpensive, and delivers performance that is comparable to, and higher than, the more expensive traditional COMA and CC-NUMA organizations, respectively.

AB - Dramatic increases in the number of transistors that can be integrated on a VLSI chip will soon allow commodity microprocessors to include both processor and a sizable fraction of main memory on chip. Distributed Shared-Memory (DSM) multiprocessors typically use the latest off-the-shelf microprocessors and thus will be affected by the upcoming processor-memory integration. In this paper, we explore how a cache-coherent DSM machine built around Processor-In-Memory (PIM) chips might be cost-effectively organized. To take advantage of the close coupling between processor and memory, we propose tagging the memory and organizing it as a cache. Furthermore, commercial considerations dictate the use of off-the-shelf hardware largely designed for uniprocessors. Consequently, we keep the directory control off-chip. To keep the multiprocessor cheap and simple, and to allow for reconfigurability, directory control is performed by chips that are identical to the ones used as compute nodes. As a result, the machine hardware can be easily reconfigured for computing or coherence-handling depending on the needs of the application. We also propose a cache coherence protocol that is tailored to our architecture: it uses the memory very efficiently while exploiting the large caching space available. Overall, the resulting machine is simple and inexpensive, and delivers performance that is comparable to, and higher than, the more expensive traditional COMA and CC-NUMA organizations, respectively.

UR - http://www.scopus.com/inward/record.url?scp=0033348959&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033348959&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:0033348959

SP - 15

EP - 25

ER -