TY - GEN
T1 - Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices
AU - Sun, Yan
AU - Yuan, Yifan
AU - Yu, Zeduo
AU - Kuper, Reese
AU - Song, Chihun
AU - Huang, Jinghan
AU - Ji, Houxiang
AU - Agarwal, Siddharth
AU - Lou, Jiaqi
AU - Jeong, Ipoom
AU - Wang, Ren
AU - Ahn, Jung Ho
AU - Xu, Tianyin
AU - Kim, Nam Sung
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/10/28
Y1 - 2023/10/28
N2 - The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technologies from a specific memory interface of the CPU. However, since CXL memory devices have not been widely available, they have been emulated using DDR memory in a remote NUMA node. In this paper, for the first time, we comprehensively evaluate a true CXL-ready system based on the latest 4th-generation Intel Xeon CPU with three CXL memory devices from different manufacturers. Specifically, we run a set of microbenchmarks not only to compare the performance of true CXL memory with that of emulated CXL memory but also to analyze the complex interplay between the CPU and CXL memory in depth. This reveals important differences between emulated CXL memory and true CXL memory, some of which will compel researchers to revisit the analyses and proposals from recent work. Next, we identify opportunities for memory-bandwidth-intensive applications to benefit from the use of CXL memory. Lastly, we propose a CXL-memory-aware dynamic page allocation policy, Caption to more efficiently use CXL memory as a bandwidth expander. We demonstrate that Caption can automatically converge to an empirically favorable percentage of pages allocated to CXL memory, which improves the performance of memory-bandwidth-intensive applications by up to 24% when compared to the default page allocation policy designed for traditional NUMA systems.
AB - The ever-growing demands for memory with larger capacity and higher bandwidth have driven recent innovations on memory expansion and disaggregation technologies based on Compute eXpress Link (CXL). Especially, CXL-based memory expansion technology has recently gained notable attention for its ability not only to economically expand memory capacity and bandwidth but also to decouple memory technologies from a specific memory interface of the CPU. However, since CXL memory devices have not been widely available, they have been emulated using DDR memory in a remote NUMA node. In this paper, for the first time, we comprehensively evaluate a true CXL-ready system based on the latest 4th-generation Intel Xeon CPU with three CXL memory devices from different manufacturers. Specifically, we run a set of microbenchmarks not only to compare the performance of true CXL memory with that of emulated CXL memory but also to analyze the complex interplay between the CPU and CXL memory in depth. This reveals important differences between emulated CXL memory and true CXL memory, some of which will compel researchers to revisit the analyses and proposals from recent work. Next, we identify opportunities for memory-bandwidth-intensive applications to benefit from the use of CXL memory. Lastly, we propose a CXL-memory-aware dynamic page allocation policy, Caption to more efficiently use CXL memory as a bandwidth expander. We demonstrate that Caption can automatically converge to an empirically favorable percentage of pages allocated to CXL memory, which improves the performance of memory-bandwidth-intensive applications by up to 24% when compared to the default page allocation policy designed for traditional NUMA systems.
KW - Compute eXpress Link
KW - measurement
KW - tiered-memory management
UR - http://www.scopus.com/inward/record.url?scp=85180301501&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85180301501&partnerID=8YFLogxK
U2 - 10.1145/3613424.3614256
DO - 10.1145/3613424.3614256
M3 - Conference contribution
AN - SCOPUS:85180301501
T3 - Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
SP - 105
EP - 121
BT - Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
PB - Association for Computing Machinery
T2 - 56th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2023
Y2 - 28 October 2023 through 1 November 2023
ER -