Lupin: Tolerating Partial Failures in a CXL Pod

Zhiting Zhu, Newton Ni, Yibo Huang, Yan Sun, Zhipeng Jia, Nam Sung Kim, Emmett Witchel

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

A compute express link (CXL) pod is a collection of hosts attached to a CXL memory module. It provides an opportunity to port single-host shared-memory programs to execute on multiple hosts in a CXL pod, where the ported application achieves higher performance than a distributed application that uses network for coordination. The cost of performance scaling on a CXL pod is that applications should tolerate partial failures, where one process or operating system fails or reboots. Lupin is system software that includes kernel modifications and user-level libraries to help applications remain available while they recover from partial failures using the contents of CXL memory.

Original languageEnglish (US)
Title of host publicationDIMES 2024 - Proceedings of the 2nd Workshop on Disruptive Memory Systems, Part of
Subtitle of host publicationSOSP 2024
PublisherAssociation for Computing Machinery
Pages41-50
Number of pages10
ISBN (Electronic)9798400713033
DOIs
StatePublished - Nov 3 2024
Event2nd Workshop on Disruptive Memory Systems, DIMES 2024, co-located with the 30th ACM Symposium on Operating Systems Principles, SOSP 2024 - Austin, United States
Duration: Nov 3 2024 → …

Publication series

NameDIMES 2024 - Proceedings of the 2nd Workshop on Disruptive Memory Systems, Part of: SOSP 2024

Conference

Conference2nd Workshop on Disruptive Memory Systems, DIMES 2024, co-located with the 30th ACM Symposium on Operating Systems Principles, SOSP 2024
Country/TerritoryUnited States
CityAustin
Period11/3/24 → …

Keywords

  • CXL
  • Partial failure tolerance

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Software

Fingerprint

Dive into the research topics of 'Lupin: Tolerating Partial Failures in a CXL Pod'. Together they form a unique fingerprint.

Cite this