TY - GEN
T1 - Evaluating Hardware Memory Disaggregation under Delay and Contention
AU - Patke, Archit
AU - Qiu, Haoran
AU - Jha, Saurabh
AU - Venugopal, Srikumar
AU - Gazzetti, Michele
AU - Pinto, Christian
AU - Kalbarczyk, Zbigniew
AU - Iyer, Ravishankar
N1 - Funding Information:
VIII. ACKNOWLEDGEMENTS We thank the anonymous reviewers for their valuable comments that improved the paper. This work is partially supported by the National Science Foundation (NSF) under grant No. CCF 20-29049; by the IBM-ILLINOIS Center for Cognitive Computing Systems Research (C3SR), a research collaboration that is part of the IBM AI Horizon Network; and by the IBM-ILLINOIS Discovery Accelerator Institute (IIDAI).
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Hardware memory disaggregation is an emerging trend in datacenters that provides access to remote memory as part of a shared pool or unused memory on machines across the network. Memory disaggregation aims to improve memory utilization and scale memory-intensive applications. Current state-of-the-art prototypes have shown that hardware disaggregated memory is a reality at the rack-scale. However, the memory utilization benefits of memory disaggregation can only be fully realized at larger scales enabled by a datacenter-wide network. Introduction of a datacenter network results in new performance and reliability failures that may manifest as higher network latency. Additionally, sharing of the network introduces new points of contention between multiple applications. In this work, we characterize the impact of variable network latency and contention in an open-source hardware disaggregated memory prototype - ThymesisFlow. To support our characterization, we have developed a delay injection framework that introduces delays in remote memory access to emulate network latency. Based on the characterization results, we develop insights into how reliability and resource allocation mechanisms should evolve to support hardware memory disaggregation beyond rack-scale in datacenters.
AB - Hardware memory disaggregation is an emerging trend in datacenters that provides access to remote memory as part of a shared pool or unused memory on machines across the network. Memory disaggregation aims to improve memory utilization and scale memory-intensive applications. Current state-of-the-art prototypes have shown that hardware disaggregated memory is a reality at the rack-scale. However, the memory utilization benefits of memory disaggregation can only be fully realized at larger scales enabled by a datacenter-wide network. Introduction of a datacenter network results in new performance and reliability failures that may manifest as higher network latency. Additionally, sharing of the network introduces new points of contention between multiple applications. In this work, we characterize the impact of variable network latency and contention in an open-source hardware disaggregated memory prototype - ThymesisFlow. To support our characterization, we have developed a delay injection framework that introduces delays in remote memory access to emulate network latency. Based on the characterization results, we develop insights into how reliability and resource allocation mechanisms should evolve to support hardware memory disaggregation beyond rack-scale in datacenters.
KW - datacenter networks
KW - datacenters
KW - fault injection
KW - memory disaggregation
KW - remote memory
UR - http://www.scopus.com/inward/record.url?scp=85136194560&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85136194560&partnerID=8YFLogxK
U2 - 10.1109/IPDPSW55747.2022.00210
DO - 10.1109/IPDPSW55747.2022.00210
M3 - Conference contribution
AN - SCOPUS:85136194560
T3 - Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022
SP - 1221
EP - 1227
BT - Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 36th IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2022
Y2 - 30 May 2022 through 3 June 2022
ER -