TY - GEN
T1 - Baldur
T2 - 26th IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
AU - Jokar, Mohammad Reza
AU - Qiu, Junyi
AU - Chong, Frederic T.
AU - Goddard, Lynford L.
AU - Dallesasse, John M.
AU - Feng, Milton
AU - Li, Yanjing
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/2
Y1 - 2020/2
N2 - We present the first all-optical network, Baldur, to enable power-efficient and high-speed communications in future exascale computing systems. The essence of Baldur is its ability to perform packet routing on-the-fly in the optical domain using an emerging technology called the transistor laser (TL), which presents interesting opportunities and challenges at the system level. Optical packet switching readily eliminates many inefficiencies associated with the crossings between optical and electrical domains. However, TL gates consume high power at the current technology node, which makes TL-based buffering and optical clock recovery impractical. Consequently, we must adopt novel (bufferless and clock-less) architecture and design approaches that are substantially different from those used in current networks. At the architecture level, we support a bufferless design by turning to techniques that have fallen out of favor for current networks. Baldur uses a low-radix, multi-stage network with a simple routing algorithm that drops packets to handle congestion, and we further incorporate path multiplicity and randomness to minimize packet drops. This design also minimizes the number of TL gates needed in each switch. At the logic design level, a non-conventional, length-based data encoding scheme is used to eliminate the need for clock recovery. We thoroughly validate and evaluate Baldur using a circuit simulator and a network simulator. Our results show that Baldur achieves up to 3,000X lower average latency while consuming 3.2X-26.4X less power than various state-of-the art networks under a wide variety of traffic patterns and real workloads, for the scale of 1,024 server nodes. Baldur is also highly scalable, since its power per node stays relatively constant as we increase the network size to over 1 million server nodes, which corresponds to 14.6X-31.0X power improvements compared to state-of-the-art networks at this scale.
AB - We present the first all-optical network, Baldur, to enable power-efficient and high-speed communications in future exascale computing systems. The essence of Baldur is its ability to perform packet routing on-the-fly in the optical domain using an emerging technology called the transistor laser (TL), which presents interesting opportunities and challenges at the system level. Optical packet switching readily eliminates many inefficiencies associated with the crossings between optical and electrical domains. However, TL gates consume high power at the current technology node, which makes TL-based buffering and optical clock recovery impractical. Consequently, we must adopt novel (bufferless and clock-less) architecture and design approaches that are substantially different from those used in current networks. At the architecture level, we support a bufferless design by turning to techniques that have fallen out of favor for current networks. Baldur uses a low-radix, multi-stage network with a simple routing algorithm that drops packets to handle congestion, and we further incorporate path multiplicity and randomness to minimize packet drops. This design also minimizes the number of TL gates needed in each switch. At the logic design level, a non-conventional, length-based data encoding scheme is used to eliminate the need for clock recovery. We thoroughly validate and evaluate Baldur using a circuit simulator and a network simulator. Our results show that Baldur achieves up to 3,000X lower average latency while consuming 3.2X-26.4X less power than various state-of-the art networks under a wide variety of traffic patterns and real workloads, for the scale of 1,024 server nodes. Baldur is also highly scalable, since its power per node stays relatively constant as we increase the network size to over 1 million server nodes, which corresponds to 14.6X-31.0X power improvements compared to state-of-the-art networks at this scale.
KW - All-optical network
KW - Datacenter network
KW - Exascale computing
KW - Optical computing
UR - http://www.scopus.com/inward/record.url?scp=85084191331&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85084191331&partnerID=8YFLogxK
U2 - 10.1109/HPCA47549.2020.00022
DO - 10.1109/HPCA47549.2020.00022
M3 - Conference contribution
AN - SCOPUS:85084191331
T3 - Proceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
SP - 153
EP - 166
BT - Proceedings - 2020 IEEE International Symposium on High Performance Computer Architecture, HPCA 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 22 February 2020 through 26 February 2020
ER -