Architecting waferscale processors-A GPU case study

Saptadeep Pal, Daniel Petrisko, Matthew Tomei, Puneet Gupta, Subramanian S. Iyer, Rakesh Kumar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Increasing communication overheads are already threatening computer system scaling. One approach to dramatically reduce communication overheads is waferscale processing. However, waferscale processors [1], [2], [3] have been historically deemed impractical due to yield issues [1], [4] inherent to conventional integration technology. Emerging integration technologies such as Silicon-Interconnection Fabric (Si-IF) [5], [6], [7], where pre-manufactured dies are directly bonded on to a silicon wafer, may enable one to build a waferscale system without the corresponding yield issues. As such, waferscalar architectures need to be revisited. In this paper, we study if it is feasible and useful to build today's architectures at waferscale. Using a waferscale GPU as a case study, we show that while a 300 mm wafer can house about 100 GPU modules (GPM), only a much scaled down GPU architecture with about 40 GPMs can be built when physical concerns are considered. We also study the performance and energy implications of waferscale architectures. We show that waferscale GPUs can provide significant performance and energy efficiency advantages (up to 18.9x speedup and 143x EDP benefit compared against equivalent MCM-GPU based implementation on PCB) without any change in the programming model. We also develop thread scheduling and data placement policies for waferscale GPU architectures. Our policies outperform state-of-art scheduling and data placement policies by up to 2.88x (average 1.4x) and 1.62x (average 1.11x) for 24 GPM and 40 GPM cases respectively. Finally, we build the first Si-IF prototype with interconnected dies. We observe 100% of the inter-die interconnects to be successfully connected in our prototype. Coupled with the high yield reported previously for bonding of dies on Si-IF, this demonstrates the technological readiness for building a waferscale GPU architecture.

Original languageEnglish (US)
Title of host publicationProceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages250-263
Number of pages14
ISBN (Electronic)9781728114446
DOIs
StatePublished - Mar 26 2019
Event25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 - Washington, United States
Duration: Feb 16 2019Feb 20 2019

Publication series

NameProceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019

Conference

Conference25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019
CountryUnited States
CityWashington
Period2/16/192/20/19

Keywords

  • GPU
  • Silicon Interconnect Fabric
  • Waferscale Processors

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications

Fingerprint Dive into the research topics of 'Architecting waferscale processors-A GPU case study'. Together they form a unique fingerprint.

  • Cite this

    Pal, S., Petrisko, D., Tomei, M., Gupta, P., Iyer, S. S., & Kumar, R. (2019). Architecting waferscale processors-A GPU case study. In Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019 (pp. 250-263). [8675211] (Proceedings - 25th IEEE International Symposium on High Performance Computer Architecture, HPCA 2019). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/HPCA.2019.00042