Cloud-Bursting and Autoscaling for Python-Native Scientific Workflows Using Ray

Tingkai Liu, Marquita Ellis, Carlos Costa, Claudia Misale, Sara Kokkila-Schumacher, Jinwook Jung, Gi Joon Nam, Volodymyr Kindratenko

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We have extended the Ray framework to enable automatic scaling of workloads on high-performance computing (HPC) clusters managed by SLURM© and bursting to Cloud managed by Kubernetes®. Compared to existing HPC-Cloud convergence solutions, our framework demonstrates advantages in several aspects: users can provide their own Cloud resource, the framework provides the Python-level abstraction that does not require users to interact with job submission systems, and allows a single Python-based parallel workload to be run concurrently across an HPC cluster and a Cloud. Applications in Electronic Design Automation are used to demonstrate the functionality of this solution in scaling the workload on an on-premises HPC system and automatically bursting to a public Cloud when running out of allocated HPC resources. The paper focuses on describing the initial implementation and demonstrating novel functionality of the proposed framework as well as identifying practical considerations and limitations for using Cloud bursting mode. The code of our framework is open-sourced.

Original languageEnglish (US)
Title of host publicationHigh Performance Computing - ISC High Performance 2023 International Workshops, Revised Selected Papers
EditorsAmanda Bienz, Michèle Weiland, Marc Baboulin, Carola Kruse
PublisherSpringer
Pages207-220
Number of pages14
ISBN (Print)9783031408427
DOIs
StatePublished - 2023
Event38th International Conference on High Performance Computing, ISC High Performance 2023 - Hamburg, Germany
Duration: May 21 2023May 25 2023

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13999 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference38th International Conference on High Performance Computing, ISC High Performance 2023
Country/TerritoryGermany
CityHamburg
Period5/21/235/25/23

Keywords

  • Cloud bursting
  • HPC
  • Kubernetes

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Cloud-Bursting and Autoscaling for Python-Native Scientific Workflows Using Ray'. Together they form a unique fingerprint.

Cite this