TY - GEN
T1 - Cloak
T2 - 36th ACM International Conference on Supercomputing, ICS 2022
AU - Kokolis, Apostolos
AU - Mantri, Namrata
AU - Ganapathy, Shrikanth
AU - Torrellas, Josep
AU - Kalamatianos, John
N1 - Publisher Copyright:
© 2022 ACM.
PY - 2022/6/28
Y1 - 2022/6/28
N2 - The increased memory demands of workloads are putting high pressure on Last Level Caches (LLCs). In general, there is limited opportunity to increase the capacity of LLCs due to the area and power requirements of the underlying SRAM technology. Interestingly, emerging Non-Volatile Memory (NVM) technologies promise a feasible alternative to SRAM for LLCs due to their higher area density. However, NVMs have substantially higher read and write latencies, which offset their density benefit. Although researchers have proposed methods to tolerate NVM's higher write latency, little emphasis has been placed on the critical NVM read latency. To address this problem, this paper proposes Cloak. Cloak exploits page-level data reuse in the LLC, to hide NVM read latency. Specifically, on certain L1 DTLB misses, Cloak transfers LLC-resident data belonging to the TLB-missing page from the LLC NVM array to a set of small SRAM Page Buffers that will service subsequent requests to this page. Further, to enable the high-bandwidth, low-latency transfer of lines of a page to the page buffers, Cloak uses an LLC layout that accelerates the discovery of LLC-resident cache lines from the page. We evaluate Cloak with full-system simulations of a 4-core processor across 14 workloads. We find that, on average, a machine with Cloak is faster than one with an SRAM LLC by 23.8% and one with an NVM-only LLC by 8.9%-in both cases, with negligible change in area. Further, Cloak reduces the ED2 metric relative to these designs by 39.9% and 17.5%, respectively.
AB - The increased memory demands of workloads are putting high pressure on Last Level Caches (LLCs). In general, there is limited opportunity to increase the capacity of LLCs due to the area and power requirements of the underlying SRAM technology. Interestingly, emerging Non-Volatile Memory (NVM) technologies promise a feasible alternative to SRAM for LLCs due to their higher area density. However, NVMs have substantially higher read and write latencies, which offset their density benefit. Although researchers have proposed methods to tolerate NVM's higher write latency, little emphasis has been placed on the critical NVM read latency. To address this problem, this paper proposes Cloak. Cloak exploits page-level data reuse in the LLC, to hide NVM read latency. Specifically, on certain L1 DTLB misses, Cloak transfers LLC-resident data belonging to the TLB-missing page from the LLC NVM array to a set of small SRAM Page Buffers that will service subsequent requests to this page. Further, to enable the high-bandwidth, low-latency transfer of lines of a page to the page buffers, Cloak uses an LLC layout that accelerates the discovery of LLC-resident cache lines from the page. We evaluate Cloak with full-system simulations of a 4-core processor across 14 workloads. We find that, on average, a machine with Cloak is faster than one with an SRAM LLC by 23.8% and one with an NVM-only LLC by 8.9%-in both cases, with negligible change in area. Further, Cloak reduces the ED2 metric relative to these designs by 39.9% and 17.5%, respectively.
KW - Cache hierarchy
KW - Last level cache
KW - Non-volatile memory
KW - STT-RAM
UR - http://www.scopus.com/inward/record.url?scp=85132803979&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85132803979&partnerID=8YFLogxK
U2 - 10.1145/3524059.3532381
DO - 10.1145/3524059.3532381
M3 - Conference contribution
AN - SCOPUS:85132803979
T3 - Proceedings of the International Conference on Supercomputing
BT - Proceedings of the 36th ACM International Conference on Supercomputing, ICS 2022
PB - Association for Computing Machinery
Y2 - 27 June 2022 through 30 June 2022
ER -