Baldur: A power-efficient and scalable network using all-optical switches

Mohammad Reza Jokar, Junyi Qiu, Frederic T. Chong, Lynford L. Goddard, John Michael Dallesasse, Milton Feng, Yanjing Li

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present the first all-optical network, Baldur, to enable power-efficient and high-speed communications in future exascale computing systems. The essence of Baldur is its ability to perform packet routing on-the-fly in the optical domain using an emerging technology called the transistor laser (TL), which presents interesting opportunities and challenges at the system level. Optical packet switching readily eliminates many inefficiencies associated with the crossings between optical and electrical domains. However, TL gates consume high power at the current technology node, which makes TL-based buffering and optical clock recovery impractical. Consequently, we must adopt novel (bufferless and clock-less) architecture and design approaches that are substantially different from those used in current networks. At the architecture level, we support a bufferless design by turning to techniques that have fallen out of favor for current networks. Baldur uses a low-radix, multi-stage network with a simple routing algorithm that drops packets to handle congestion, and we further incorporate path multiplicity and randomness to minimize packet drops. This design also minimizes the number of TL gates needed in each switch. At the logic design level, a non-conventional, length-based data encoding scheme is used to eliminate the need for clock recovery. We thoroughly validate and evaluate Baldur using a circuit simulator and a network simulator. Our results show that Baldur achieves up to 3,000X lower average latency while consuming 3.2X-26.4X less power than various state-of-the art networks under a wide variety of traffic patterns and real workloads, for the scale of 1,024 server nodes. Baldur is also highly scalable, since its power per node stays relatively constant as we increase the network size to over 1 million server nodes, which corresponds to 14.6X-31.0X power improvements compared to state-of-the-art networks at this scale.

Original languageEnglish (US)
Title of host publicationProceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages153-166
Number of pages14
ISBN (Electronic)9781728161495
DOIs
StatePublished - Feb 2020
Event26th IEEE International Symposium on High Performance Computer Architecture, HPCA 2020 - San Diego, United States
Duration: Feb 22 2020Feb 26 2020

Publication series

NameProceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020

Conference

Conference26th IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
CountryUnited States
CitySan Diego
Period2/22/202/26/20

Keywords

  • All-optical network
  • Datacenter network
  • Exascale computing
  • Optical computing

ASJC Scopus subject areas

  • Artificial Intelligence
  • Hardware and Architecture
  • Safety, Risk, Reliability and Quality

Fingerprint Dive into the research topics of 'Baldur: A power-efficient and scalable network using all-optical switches'. Together they form a unique fingerprint.

  • Cite this

    Jokar, M. R., Qiu, J., Chong, F. T., Goddard, L. L., Dallesasse, J. M., Feng, M., & Li, Y. (2020). Baldur: A power-efficient and scalable network using all-optical switches. In Proceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020 (pp. 153-166). [9065595] (Proceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/HPCA47549.2020.00022