TY - GEN
T1 - Parallel virtualized memory translation with nested elastic cuckoo page tables
AU - Stojkovic, Jovan
AU - Skarlatos, Dimitrios
AU - Kokolis, Apostolos
AU - Xu, Tianyin
AU - Torrellas, Josep
N1 - Funding Information:
This work was supported by NSF under grant CNS 1956007 and CNS 2107307. We thank Dan Tsafrir for his valuable feedback.
Publisher Copyright:
© 2022 ACM.
PY - 2022/2/28
Y1 - 2022/2/28
N2 - A major reason why nested or virtualized address translations are slow is because current systems organize page tables in a multi-level tree that is accessed in a sequential manner. A nested translation may potentially require up to twenty-four sequential memory accesses. To address this problem, this paper presents the first page table design that supports parallel nested address translation. The design is based on using hashed page tables (HPTs) for both guest and host. However, directly extending a native HPT design to a nested environment leads to minor gains. Instead, our design solves a new set of challenges that appear in nested environments. Our scheme eliminates all but three of the potentially twenty-four sequential steps of a nested translation-while judiciously limiting the number of parallel memory accesses issued to avoid over-consuming cache bandwidth. As a result, compared to conventional nested radix tables, our design speeds-up the execution of a set of applications by an average of 1.19x (for 4KB pages) and 1.24x (when huge pages are used). In addition, we also show a migration path from current nested radix page tables to our design.
AB - A major reason why nested or virtualized address translations are slow is because current systems organize page tables in a multi-level tree that is accessed in a sequential manner. A nested translation may potentially require up to twenty-four sequential memory accesses. To address this problem, this paper presents the first page table design that supports parallel nested address translation. The design is based on using hashed page tables (HPTs) for both guest and host. However, directly extending a native HPT design to a nested environment leads to minor gains. Instead, our design solves a new set of challenges that appear in nested environments. Our scheme eliminates all but three of the potentially twenty-four sequential steps of a nested translation-while judiciously limiting the number of parallel memory accesses issued to avoid over-consuming cache bandwidth. As a result, compared to conventional nested radix tables, our design speeds-up the execution of a set of applications by an average of 1.19x (for 4KB pages) and 1.24x (when huge pages are used). In addition, we also show a migration path from current nested radix page tables to our design.
KW - Page Tables
KW - Virtual Memory
KW - Virtualization
UR - http://www.scopus.com/inward/record.url?scp=85126391294&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126391294&partnerID=8YFLogxK
U2 - 10.1145/3503222.3507720
DO - 10.1145/3503222.3507720
M3 - Conference contribution
AN - SCOPUS:85126391294
T3 - International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS
SP - 84
EP - 97
BT - ASPLOS 2022 - Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
A2 - Falsafi, Babak
A2 - Ferdman, Michael
A2 - Lu, Shan
A2 - Wenisch, Thomas F.
PB - Association for Computing Machinery
T2 - 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2022
Y2 - 28 February 2022 through 4 March 2022
ER -