TY - GEN
T1 - Demystifying a CXL Type-2 Device
T2 - 57th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2024
AU - Ji, Houxiang
AU - Vanavasam, Srikar
AU - Zhou, Yang
AU - Xia, Qirong
AU - Huang, Jinghan
AU - Yuan, Yifan
AU - Wang, Ren
AU - Gupta, Pekon
AU - Chitlur, Bhushan
AU - Jeong, Ipoom
AU - Kim, Nam Sung
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - CXL is the latest interconnect technology built on PCIe, providing three protocols to facilitate three distinct types of devices, each with unique capabilities. Among these devices, a CXL Type-2 device has become commercially available, followed by CXL Type-3 devices. Therefore, it is timely to understand capabilities and characteristics of the CXL Type-2 device, as well as explore suitable applications. In this work, first, we delve into three key features of a CXL Type-2 device: cache-coherent device accelerator to host memory, device accelerator to device memory, and host CPU to device memory accesses. Second, using microbenchmarks, we comprehensively characterize the latency and bandwidth of these memory accesses with a CXL Type-2 device, and then compare them with those of equivalent memory accesses with comparable devices, such as emulated CXL Type-2, CXL Type-3, and PCIe devices. Lastly, as applications that exploit the unique capabilities of a CXL Type-2 device, we propose two CXL-based Linux memory optimization features: compressed RAM cache for swap (zswap) and memory deduplication (ksm). Our evaluation shows that Redis, when running with traditional CPU-based zswap and ksm, suffers from a tail latency increase of 4.5-10.3× compared to Redis running alone. While PCIe-based zswap and ksm still experience a tail latency increase of up to 8.1×, CXL-based zswap and ksm practically eliminate the tail latency increase with faster and more efficient host-device communication than PCIe-based zswap and ksm.
AB - CXL is the latest interconnect technology built on PCIe, providing three protocols to facilitate three distinct types of devices, each with unique capabilities. Among these devices, a CXL Type-2 device has become commercially available, followed by CXL Type-3 devices. Therefore, it is timely to understand capabilities and characteristics of the CXL Type-2 device, as well as explore suitable applications. In this work, first, we delve into three key features of a CXL Type-2 device: cache-coherent device accelerator to host memory, device accelerator to device memory, and host CPU to device memory accesses. Second, using microbenchmarks, we comprehensively characterize the latency and bandwidth of these memory accesses with a CXL Type-2 device, and then compare them with those of equivalent memory accesses with comparable devices, such as emulated CXL Type-2, CXL Type-3, and PCIe devices. Lastly, as applications that exploit the unique capabilities of a CXL Type-2 device, we propose two CXL-based Linux memory optimization features: compressed RAM cache for swap (zswap) and memory deduplication (ksm). Our evaluation shows that Redis, when running with traditional CPU-based zswap and ksm, suffers from a tail latency increase of 4.5-10.3× compared to Redis running alone. While PCIe-based zswap and ksm still experience a tail latency increase of up to 8.1×, CXL-based zswap and ksm practically eliminate the tail latency increase with faster and more efficient host-device communication than PCIe-based zswap and ksm.
KW - compute express link
KW - heterogeneous computing
UR - http://www.scopus.com/inward/record.url?scp=85213383946&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85213383946&partnerID=8YFLogxK
U2 - 10.1109/MICRO61859.2024.00110
DO - 10.1109/MICRO61859.2024.00110
M3 - Conference contribution
AN - SCOPUS:85213383946
T3 - Proceedings of the Annual International Symposium on Microarchitecture, MICRO
SP - 1504
EP - 1517
BT - Proceedings - 2024 57th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2024
PB - IEEE Computer Society
Y2 - 2 November 2024 through 6 November 2024
ER -